[Cpp API Compatibility] Add Generator related APIs#78070
[Cpp API Compatibility] Add Generator related APIs#78070SigureMo merged 18 commits intoPaddlePaddle:developfrom
Generator related APIs#78070Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Pull request overview
This PR introduces a PyTorch C++ compatibility surface for at::Generator (and supporting c10 types) by bridging to Paddle’s phi::Generator, enabling code that expects PyTorch’s generator/dispatch-key APIs to compile against Paddle.
Changes:
- Added a
c10::intrusive_ptrcompatibility type implemented as a wrapper overstd::shared_ptr. - Added
c10::GeneratorImplandat::Generatorcompatibility headers that wrapphi::Generator. - Added
c10::DispatchKey/c10::DispatchKeySetcompatibility headers to support generator dispatch-key queries.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| paddle/phi/api/include/compat/c10/util/intrusive_ptr.h | Introduces shared_ptr-backed c10::intrusive_ptr for API compatibility. |
| paddle/phi/api/include/compat/c10/core/GeneratorImpl.h | Implements a GeneratorImpl wrapper around phi::Generator. |
| paddle/phi/api/include/compat/ATen/core/Generator.h | Adds at::Generator facade and helper utilities (clone/state/mutex). |
| paddle/phi/api/include/compat/c10/core/DispatchKey.h | Adds dispatch key enums/util declarations for c10 compat. |
| paddle/phi/api/include/compat/c10/core/DispatchKeySet.h | Adds DispatchKeySet bitset/iterator utilities and related declarations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/re-run all-failed |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #78070 +/- ##
==========================================
Coverage ? 96.09%
==========================================
Files ? 7
Lines ? 384
Branches ? 0
==========================================
Hits ? 369
Misses ? 15
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/re-run all-failed |
| * Wraps phi::DefaultCUDAGenerator for the given device and exposes the | ||
| * PyTorch-compatible Philox-based random number generator interface. | ||
| */ | ||
| struct CUDAGeneratorImpl : public c10::GeneratorImpl { |
There was a problem hiding this comment.
这里面的实现是自己实现的还是完全fellow torch的实现?
There was a problem hiding this comment.
fellow torch的实现,部分报错的依赖再移植到paddle
265ebac to
8386346
Compare
|
/re-run all-failed |
|
/re-run all-failed |
- intrusive_ptr.h: Add missing headers <cstdint> and <type_traits> - intrusive_ptr.h: Mark release() as deprecated to prevent dangling pointer issues - DispatchKey.h: Add inline implementations for toString, operator<<, toBackendComponent - DispatchKeySet.h: Implement iterator::operator++() - DispatchKeySet.h: Add toString, operator<<, isBackendDispatchKey - DispatchKeySet.h: Add getRuntimeDispatchKeySet, runtimeDispatchKeySetHas - DispatchKeySet.h: Add getBackendKeySetFromAutograd, initializeFunctionalityOffsetsAndMasks - Generator.h and CUDAGeneratorImpl.h: Already have required APIs (getDefaultCUDAGenerator, createCUDAGenerator, get_generator_or_default) All declared functions now have corresponding inline implementations. 🤖 Generated with Claude Code
|
/re-run all-failed |
2 similar comments
|
/re-run all-failed |
|
/re-run all-failed |
SigureMo
left a comment
There was a problem hiding this comment.
可以看看 @ShigureNyako 的建议,如果合理的话可以改一下,不合理的话说明一下原因即可,这里检查两个框架之间的细节差异可能 Agent 比我更专业了 😂
ok |
- intrusive_ptr.h: Add missing headers <cstdint> and <type_traits> - intrusive_ptr.h: Mark release() as deprecated to prevent dangling pointer issues - DispatchKey.h: Add inline implementations for toString, operator<<, toBackendComponent - DispatchKeySet.h: Implement iterator::operator++() - DispatchKeySet.h: Add toString, operator<<, isBackendDispatchKey - DispatchKeySet.h: Add getRuntimeDispatchKeySet, runtimeDispatchKeySetHas - DispatchKeySet.h: Add getBackendKeySetFromAutograd, initializeFunctionalityOffsetsAndMasks - Generator.h and CUDAGeneratorImpl.h: Already have required APIs (getDefaultCUDAGenerator, createCUDAGenerator, get_generator_or_default) All declared functions now have corresponding inline implementations. 🤖 Generated with Claude Code
…ddle#78070](https://github.com/youge325/Paddle/issues/78070) (Round 2) Replace the std::shared_ptr wrapper implementation with true intrusive reference counting to fix unsafeReleaseGeneratorImpl semantics. Changes: - intrusive_ptr.h: Complete rewrite with intrusive_ptr_target base class - Atomic combined refcount/weakcount for thread safety - Proper release() that transfers ownership (clears intrusive_ptr) - reclaim() for adopting raw pointers without incref - weak_intrusive_ptr with proper lock() semantics - GeneratorImpl.h: Now inherits from intrusive_ptr_target instead of using std::shared_ptr, enabling proper intrusive refcounting - CUDAGeneratorImpl.h: Verified compatible with new implementation - cuda_generator_test.cc: Add 3 regression tests for AC-3: - UnsafeReleaseMakesGeneratorUndefined: verifies defined() -> false - UnsafeReleaseAndReclaim: verifies no double-free on reclaim - UnsafeReleaseAndReclaimRoundTrip: full release/reclaim workflow All tests pass: - c10: 10/10 - ATen: 33/33
The explicit intrusive_ptr constructor now stores kUniqueRef (strong=1, weak=1) directly instead of calling retain_(), matching PyTorch semantics. make_intrusive() now uses this constructor rather than reclaim(), so newly created objects have use_count()==1 instead of 0. Previously, make_intrusive() called reclaim() which skipped incrementing the refcount, leaving combined_refcount_=0. This caused: - use_count() to return 0 on freshly created objects - memory leak on destruction (fetch_sub underflow, delete never triggered) - use-after-free when copies were made before the original was destroyed Three new tests verify the corrected behaviour: MakeIntrusiveInitialRefcountIsOne, CopyIntrusivePtrIncrementsRefcount, MoveIntrusivePtrKeepsRefcount. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion When a per-backend functionality key's backend loop exhausts all set backend bits, the iterator now continues scanning for the next functionality key (via continue) instead of returning an invalid state where current_dispatchkey_idx_ points to a per-backend key but current_backendcomponent_idx_ is end_iter_key_val. Previously, operator++ would return after resetting next_backend_ and current_backendcomponent_idx_, which caused operator*() to enter the isPerBackendFunctionalityKey branch with an invalid backend index and trigger TORCH_INTERNAL_ASSERT. Now the loop continues cleanly to the next functionality key or reaches the end state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…offset BackendComponent is 1-based (InvalidBit=0, CPUBit=1, CUDABit=2, ...), while the iterator's next_backend_ is a 0-based bit index into repr_. The bit at position next_backend_ corresponds to BackendComponent value next_backend_+1, matching the DispatchKeySet(BackendComponent) constructor which stores bit (k-1) for component k. Previously current_backendcomponent_idx_ was set to next_backend_ (the raw bit index), causing static_cast<BackendComponent>(idx) to always yield InvalidBit for the first backend, making operator*() return DispatchKey::Undefined for all CPU/CUDA/etc. per-backend keys. Fix: store next_backend_ + 1 so the cast to BackendComponent produces the correct enum value and toRuntimePerBackendFunctionalityKey returns the intended runtime key (e.g. DispatchKey::CPU). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove dead code (make_intrusive_wrapper class), redundant comments, and section-header comments that restate function names across intrusive_ptr.h, DispatchKeySet.h, and c10_generator_impl_test.cc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
/re-run all-failed |
来自 @codex 的结果: 汇总
结论很直接:
所以问题是 |
…ice_type validation Fixes two blocking issues from ShigureNyako's latest CHANGES_REQUESTED review on PR PaddlePaddle#78070 (Generator compatibility layer): 1. **intrusive_ptr two-phase lifecycle**: `reset_()` and `raw::intrusive_ptr::decref()` previously called `delete target_` as soon as the strong refcount reached zero, ignoring any surviving `weak_intrusive_ptr` instances and causing use-after-free. - Fix: when strong count drops to 0, release the implicit weak reference (the +1 encoded in `kUniqueRef`) and only `delete` when weak count also reaches 0. - Fix: `weak_intrusive_ptr::reset_()` now triggers `delete` when weak count reaches 0. - Fix: `raw::weak_intrusive_ptr::decref()` same pattern. - Design: aligned with PyTorch's two-phase lifecycle where strong refs count as +1 to weakcount via `kUniqueRef = kReferenceCountOne | kWeakReferenceCountOne`. 2. **check_generator device_type validation**: `check_generator<T>` was missing the `T::device_type() == gen->device().type()` guard present in PyTorch, allowing silent undefined-behavior casts for wrong-backend generators. - Fix: add `TORCH_CHECK` using `phi::AllocationTypeStr` for clear error messages. New tests added: - `c10_intrusive_ptr_lifecycle_test.cc`: 6 tests covering weak-ref lifecycle, two-phase deletion, lock() behavior, multiple copies, raw API path. - `cuda_generator_test.cc`: 2 new tests for device_type mismatch and matching. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Document that bit 63 in combined_refcount_ is reserved for kHasPyObject (PyTorch API compatibility); compat layer does not use it but masks it out to match PyTorch's weakcount extraction. - Add comment on unsafe_adopt explaining it is a PyTorch API alias for reclaim(); prefer reclaim() in new code. - Remove dead null guard on destroy_count_ in TestTarget destructor; all callers always provide a non-null pointer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add CheckGeneratorThrowsOnUndefined test: constructs an undefined Generator (at::Generator()) and verifies check_generator<T> throws on the gen->defined() TORCH_CHECK branch, completing AC-2's negative test matrix (nullopt / undefined / device-type-mismatch all covered). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
/re-run all-failed |
Generator related APIs
|
/re-run all-failed |
1 similar comment
|
/re-run all-failed |

PR Category
Execute Infrastructure
PR Types
New features
Description
新增
at::Generatorat::cuda::detail::getDefaultCUDAGeneratorat::get_generator_or_default<at::CUDAGeneratorImpl>接口及其依赖文档详见 PFCCLab/PaddleCppAPITest#49
是否引起精度变化
否