Skip to content

[Cpp API Compatibility] Add Generator related APIs#78070

Merged
SigureMo merged 18 commits intoPaddlePaddle:developfrom
youge325:cGenerator
Mar 22, 2026
Merged

[Cpp API Compatibility] Add Generator related APIs#78070
SigureMo merged 18 commits intoPaddlePaddle:developfrom
youge325:cGenerator

Conversation

@youge325
Copy link
Copy Markdown
Contributor

@youge325 youge325 commented Feb 28, 2026

PR Category

Execute Infrastructure

PR Types

New features

Description

新增 at::Generator at::cuda::detail::getDefaultCUDAGenerator at::get_generator_or_default<at::CUDAGeneratorImpl> 接口及其依赖
文档详见 PFCCLab/PaddleCppAPITest#49

Mermaid Chart - Create complex, visual diagrams with text -2026-03-19-082844

是否引起精度变化

Copilot AI review requested due to automatic review settings February 28, 2026 04:28
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Feb 28, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot Bot added the contributor External developers label Feb 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a PyTorch C++ compatibility surface for at::Generator (and supporting c10 types) by bridging to Paddle’s phi::Generator, enabling code that expects PyTorch’s generator/dispatch-key APIs to compile against Paddle.

Changes:

  • Added a c10::intrusive_ptr compatibility type implemented as a wrapper over std::shared_ptr.
  • Added c10::GeneratorImpl and at::Generator compatibility headers that wrap phi::Generator.
  • Added c10::DispatchKey / c10::DispatchKeySet compatibility headers to support generator dispatch-key queries.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
paddle/phi/api/include/compat/c10/util/intrusive_ptr.h Introduces shared_ptr-backed c10::intrusive_ptr for API compatibility.
paddle/phi/api/include/compat/c10/core/GeneratorImpl.h Implements a GeneratorImpl wrapper around phi::Generator.
paddle/phi/api/include/compat/ATen/core/Generator.h Adds at::Generator facade and helper utilities (clone/state/mutex).
paddle/phi/api/include/compat/c10/core/DispatchKey.h Adds dispatch key enums/util declarations for c10 compat.
paddle/phi/api/include/compat/c10/core/DispatchKeySet.h Adds DispatchKeySet bitset/iterator utilities and related declarations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread paddle/phi/api/include/compat/c10/core/DispatchKeySet.h Outdated
Comment thread paddle/phi/api/include/compat/c10/core/DispatchKeySet.h
Comment thread paddle/phi/api/include/compat/c10/util/intrusive_ptr.h Outdated
Comment thread paddle/phi/api/include/compat/c10/core/DispatchKey.h
Comment thread paddle/phi/api/include/compat/c10/core/DispatchKeySet.h
Comment thread paddle/phi/api/include/compat/c10/core/DispatchKeySet.h Outdated
Comment thread paddle/phi/api/include/compat/c10/core/DispatchKeySet.h
Comment thread paddle/phi/api/include/compat/c10/util/intrusive_ptr.h
Comment thread paddle/phi/api/include/compat/c10/core/DispatchKeySet.h
Comment thread paddle/phi/api/include/compat/c10/core/DispatchKey.h Outdated
@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@youge325 youge325 changed the title [Cpp API Compatibility] add at::Generator [Cpp API Compatibility] add Generator related APIs Feb 28, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 28, 2026

Codecov Report

❌ Patch coverage is 96.09375% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@98546bd). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...ddle/phi/api/include/compat/c10/core/DispatchKey.h 76.74% 10 Missing ⚠️
...le/phi/api/include/compat/c10/util/intrusive_ptr.h 94.56% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop   #78070   +/-   ##
==========================================
  Coverage           ?   96.09%           
==========================================
  Files              ?        7           
  Lines              ?      384           
  Branches           ?        0           
==========================================
  Hits               ?      369           
  Misses             ?       15           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Mar 1, 2026

/re-run all-failed

* Wraps phi::DefaultCUDAGenerator for the given device and exposes the
* PyTorch-compatible Philox-based random number generator interface.
*/
struct CUDAGeneratorImpl : public c10::GeneratorImpl {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里面的实现是自己实现的还是完全fellow torch的实现?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fellow torch的实现,部分报错的依赖再移植到paddle

@youge325 youge325 force-pushed the cGenerator branch 2 times, most recently from 265ebac to 8386346 Compare March 7, 2026 15:16
@youge325
Copy link
Copy Markdown
Contributor Author

youge325 commented Mar 8, 2026

/re-run all-failed

@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

youge325 added a commit to youge325/Paddle that referenced this pull request Mar 18, 2026
- intrusive_ptr.h: Add missing headers <cstdint> and <type_traits>
- intrusive_ptr.h: Mark release() as deprecated to prevent dangling pointer issues
- DispatchKey.h: Add inline implementations for toString, operator<<, toBackendComponent
- DispatchKeySet.h: Implement iterator::operator++()
- DispatchKeySet.h: Add toString, operator<<, isBackendDispatchKey
- DispatchKeySet.h: Add getRuntimeDispatchKeySet, runtimeDispatchKeySetHas
- DispatchKeySet.h: Add getBackendKeySetFromAutograd, initializeFunctionalityOffsetsAndMasks
- Generator.h and CUDAGeneratorImpl.h: Already have required APIs (getDefaultCUDAGenerator, createCUDAGenerator, get_generator_or_default)

All declared functions now have corresponding inline implementations.

🤖 Generated with Claude Code
@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

2 similar comments
@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Copy link
Copy Markdown
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以看看 @ShigureNyako 的建议,如果合理的话可以改一下,不合理的话说明一下原因即可,这里检查两个框架之间的细节差异可能 Agent 比我更专业了 😂

@youge325
Copy link
Copy Markdown
Contributor Author

可以看看 @ShigureNyako 的建议,如果合理的话可以改一下,不合理的话说明一下原因即可,这里检查两个框架之间的细节差异可能 Agent 比我更专业了 😂

ok

- intrusive_ptr.h: Add missing headers <cstdint> and <type_traits>
- intrusive_ptr.h: Mark release() as deprecated to prevent dangling pointer issues
- DispatchKey.h: Add inline implementations for toString, operator<<, toBackendComponent
- DispatchKeySet.h: Implement iterator::operator++()
- DispatchKeySet.h: Add toString, operator<<, isBackendDispatchKey
- DispatchKeySet.h: Add getRuntimeDispatchKeySet, runtimeDispatchKeySetHas
- DispatchKeySet.h: Add getBackendKeySetFromAutograd, initializeFunctionalityOffsetsAndMasks
- Generator.h and CUDAGeneratorImpl.h: Already have required APIs (getDefaultCUDAGenerator, createCUDAGenerator, get_generator_or_default)

All declared functions now have corresponding inline implementations.

🤖 Generated with Claude Code
…ddle#78070](https://github.com/youge325/Paddle/issues/78070) (Round 2)

Replace the std::shared_ptr wrapper implementation with true intrusive
reference counting to fix unsafeReleaseGeneratorImpl semantics.

Changes:
- intrusive_ptr.h: Complete rewrite with intrusive_ptr_target base class
  - Atomic combined refcount/weakcount for thread safety
  - Proper release() that transfers ownership (clears intrusive_ptr)
  - reclaim() for adopting raw pointers without incref
  - weak_intrusive_ptr with proper lock() semantics
- GeneratorImpl.h: Now inherits from intrusive_ptr_target instead of using
  std::shared_ptr, enabling proper intrusive refcounting
- CUDAGeneratorImpl.h: Verified compatible with new implementation
- cuda_generator_test.cc: Add 3 regression tests for AC-3:
  - UnsafeReleaseMakesGeneratorUndefined: verifies defined() -> false
  - UnsafeReleaseAndReclaim: verifies no double-free on reclaim
  - UnsafeReleaseAndReclaimRoundTrip: full release/reclaim workflow

All tests pass:
- c10: 10/10
- ATen: 33/33
@SigureMo
Copy link
Copy Markdown
Member

image

coverage test 还真到 4h 了,1 个月前我还记得稳定 2.5h 的 test,现在怎么变这么久了,我让 agent 查一下试试

youge325 and others added 5 commits March 20, 2026 15:41
The explicit intrusive_ptr constructor now stores kUniqueRef (strong=1,
weak=1) directly instead of calling retain_(), matching PyTorch semantics.
make_intrusive() now uses this constructor rather than reclaim(), so newly
created objects have use_count()==1 instead of 0.

Previously, make_intrusive() called reclaim() which skipped incrementing
the refcount, leaving combined_refcount_=0. This caused:
- use_count() to return 0 on freshly created objects
- memory leak on destruction (fetch_sub underflow, delete never triggered)
- use-after-free when copies were made before the original was destroyed

Three new tests verify the corrected behaviour:
MakeIntrusiveInitialRefcountIsOne, CopyIntrusivePtrIncrementsRefcount,
MoveIntrusivePtrKeepsRefcount.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion

When a per-backend functionality key's backend loop exhausts all set
backend bits, the iterator now continues scanning for the next
functionality key (via continue) instead of returning an invalid state
where current_dispatchkey_idx_ points to a per-backend key but
current_backendcomponent_idx_ is end_iter_key_val.

Previously, operator++ would return after resetting next_backend_ and
current_backendcomponent_idx_, which caused operator*() to enter the
isPerBackendFunctionalityKey branch with an invalid backend index and
trigger TORCH_INTERNAL_ASSERT. Now the loop continues cleanly to the
next functionality key or reaches the end state.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…offset

BackendComponent is 1-based (InvalidBit=0, CPUBit=1, CUDABit=2, ...),
while the iterator's next_backend_ is a 0-based bit index into repr_.
The bit at position next_backend_ corresponds to BackendComponent value
next_backend_+1, matching the DispatchKeySet(BackendComponent) constructor
which stores bit (k-1) for component k.

Previously current_backendcomponent_idx_ was set to next_backend_ (the
raw bit index), causing static_cast<BackendComponent>(idx) to always
yield InvalidBit for the first backend, making operator*() return
DispatchKey::Undefined for all CPU/CUDA/etc. per-backend keys.

Fix: store next_backend_ + 1 so the cast to BackendComponent produces
the correct enum value and toRuntimePerBackendFunctionalityKey returns
the intended runtime key (e.g. DispatchKey::CPU).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove dead code (make_intrusive_wrapper class), redundant comments, and
section-header comments that restate function names across intrusive_ptr.h,
DispatchKeySet.h, and c10_generator_impl_test.cc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@SigureMo
Copy link
Copy Markdown
Member

coverage test 还真到 4h 了,1 个月前我还记得稳定 2.5h 的 test,现在怎么变这么久了,我让 agent 查一下试试

来自 @codex 的结果:

汇总

runner n mean P75 P90
paddle-gpu-16 16 209.0 226.9 233.9
paddle-gpu-18 15 188.6 192.3 195.4
paddle-gpu-14 10 188.6 188.4 193.8
paddle-gpu-12 17 187.8 195.8 197.1
paddle-gpu-08 27 176.8 190.1 195.1
paddle-gpu-12-1 16 169.5 174.9 182.0
paddle-gpu-07 14 168.5 174.7 175.8
paddle-gpu-03 13 167.2 171.5 175.8
paddle-gpu-06 14 166.9 172.5 174.9
paddle-gpu-04 23 166.7 170.6 174.5
paddle-gpu-10 14 166.0 172.3 175.5
paddle-gpu-05 11 164.9 168.9 176.9
paddle-gpu-12-3 19 164.6 169.9 173.6
paddle-gpu-09 11 164.1 170.2 174.4
paddle-gpu-12-2 19 163.7 168.0 171.3
paddle-gpu-02 15 162.2 165.9 168.5
paddle-gpu-11-bj 11 162.1 167.6 170.0

结论很直接:

  • paddle-gpu-16 明显最慢,不只是 mean 高,P75/P90 也显著高,说明不是偶发一两次。
  • 第二梯队偏慢的是 paddle-gpu-18 / 14 / 12 / 08。
  • 相对快的一档是 paddle-gpu-02 / 11-bj / 12-2 / 09 / 12-3。

所以问题是 paddle-gpu-16 自己比较慢

youge325 and others added 4 commits March 21, 2026 16:27
…ice_type validation

Fixes two blocking issues from ShigureNyako's latest CHANGES_REQUESTED review
on PR PaddlePaddle#78070 (Generator compatibility layer):

1. **intrusive_ptr two-phase lifecycle**: `reset_()` and
   `raw::intrusive_ptr::decref()` previously called `delete target_` as soon
   as the strong refcount reached zero, ignoring any surviving
   `weak_intrusive_ptr` instances and causing use-after-free.
   - Fix: when strong count drops to 0, release the implicit weak reference
     (the +1 encoded in `kUniqueRef`) and only `delete` when weak count also
     reaches 0.
   - Fix: `weak_intrusive_ptr::reset_()` now triggers `delete` when weak
     count reaches 0.
   - Fix: `raw::weak_intrusive_ptr::decref()` same pattern.
   - Design: aligned with PyTorch's two-phase lifecycle where strong refs
     count as +1 to weakcount via `kUniqueRef = kReferenceCountOne |
     kWeakReferenceCountOne`.

2. **check_generator device_type validation**: `check_generator<T>` was
   missing the `T::device_type() == gen->device().type()` guard present in
   PyTorch, allowing silent undefined-behavior casts for wrong-backend
   generators.
   - Fix: add `TORCH_CHECK` using `phi::AllocationTypeStr` for clear error
     messages.

New tests added:
- `c10_intrusive_ptr_lifecycle_test.cc`: 6 tests covering weak-ref lifecycle,
  two-phase deletion, lock() behavior, multiple copies, raw API path.
- `cuda_generator_test.cc`: 2 new tests for device_type mismatch and matching.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Document that bit 63 in combined_refcount_ is reserved for
  kHasPyObject (PyTorch API compatibility); compat layer does not use
  it but masks it out to match PyTorch's weakcount extraction.
- Add comment on unsafe_adopt explaining it is a PyTorch API alias for
  reclaim(); prefer reclaim() in new code.
- Remove dead null guard on destroy_count_ in TestTarget destructor;
  all callers always provide a non-null pointer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add CheckGeneratorThrowsOnUndefined test: constructs an undefined
  Generator (at::Generator()) and verifies check_generator<T> throws
  on the gen->defined() TORCH_CHECK branch, completing AC-2's negative
  test matrix (nullopt / undefined / device-type-mismatch all covered).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@SigureMo SigureMo changed the title [Cpp API Compatibility] add Generator related APIs [Cpp API Compatibility] Add Generator related APIs Mar 21, 2026
@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

1 similar comment
@youge325
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@SigureMo SigureMo merged commit a035361 into PaddlePaddle:develop Mar 22, 2026
142 of 153 checks passed
@youge325 youge325 deleted the cGenerator branch March 22, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants