Skip to content

feat: Add syncall and tprefetch_async PTO ops#688

Open
FangRui0 wants to merge 1 commit into
hw-native-sys:mainfrom
FangRui0:add_newop
Open

feat: Add syncall and tprefetch_async PTO ops#688
FangRui0 wants to merge 1 commit into
hw-native-sys:mainfrom
FangRui0:add_newop

Conversation

@FangRui0
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the pto.tprefetch_async and pto.syncall operations, providing support for asynchronous global memory prefetching and cross-core synchronization. The changes include ODS definitions, IR verification logic, EmitC lowering patterns, and comprehensive documentation, alongside a new guide for adding PTO operations. However, the reviewer identified two critical issues where the implementations for the static helper functions isByteIntegerType and verifyAsyncFlatContiguous1DGMViewLike are missing from lib/PTO/IR/PTO.cpp, which will cause compilation failures.

Comment thread lib/PTO/IR/PTO.cpp
static SmallVector<int64_t, 4> getShapeVec(Type ty);
static SmallVector<int64_t, 4> getValidShapeVec(Type ty);
static SmallVector<int64_t, 4> getValidShapeVec(Value value);
static bool isByteIntegerType(Type ty);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The implementation of the static helper function isByteIntegerType is missing from this file, which will lead to a compilation error. Please provide the implementation, for example:

static bool isByteIntegerType(Type ty) {
  auto intTy = dyn_cast<IntegerType>(ty);
  return intTy && intTy.getWidth() == 8;
}

Comment thread lib/PTO/IR/PTO.cpp
Comment on lines +143 to +145
static LogicalResult verifyAsyncFlatContiguous1DGMViewLike(Operation *op,
Value value,
StringRef name);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The implementation of the static helper function verifyAsyncFlatContiguous1DGMViewLike is missing from this file. This will cause a compilation error as it is called in TPrefetchAsyncOp::verify(). Please add the implementation to the file.

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 20, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: feat: Add syncall and tprefetch_async PTO ops #688 feat: Add syncall and tprefetch_async PTO ops
  • Author: FangRui0
  • Base/Head: main / add_newop
  • Head SHA: f88ce4076967
  • Trigger: PR 有新提交
  • Generated At: 2026-05-20T07:35:23Z
  • Previous Head SHA: 0f748d469860
  • Status: completed

Summary

发现 3 个会影响新接口可用性的缺陷:tprefetch_async 有一类 memref 在验证通过后会于 EmitC 降低阶段失败,syncall 的 rank-1 tile_buf 形式无法降低,以及 tprefetch_async 文档声明的 tensor_view 输入实际无法构造。

Findings

  1. P1 `tprefetch_async` 会放行动态-layout memref,但 EmitC 降低无法处理 lib/PTO/IR/PTO.cpp:2974

verifyAsyncFlatContiguous1DGMMemRef() 在检测到动态 stride/offset 时直接 return success(),所以带动态 layout 的 1D GM memref 会通过校验;但 PTOTPrefetchAsyncToEmitC 对 memref 路径总是调用 buildGlobalTensorFromMemref(),而这个 helper 只接受静态 shape + 静态 layout,失败后会在 PTOToEmitC 阶段报 failed to build GlobalTensor src。也就是说,一类当前被 verifier 明确放行的 tprefetch_async 输入会在代码生成阶段稳定失败,属于前端可见的编译阻断。

  1. P2 `syncall` 声称支持 rank-1 `tile_buf` 工作区,但该形式无法降低到 C++ lib/PTO/IR/PTO.cpp:12138

verifySyncAllTileWorkspace() 明确接受 rank-1 或 rank-2 的 tile_buf/memref 工作区;但 EmitC 类型转换只支持 rank-2 TileBufTypegetEmitCTileTypeString() 对非 2 维 tile 直接返回 nullopt),buildSyncAllWorkspaceTileValue() 也只对 memref 提供 1D -> 2D 的兜底路径。结果是 !pto.tile_buf<vec, Nxi32> / !pto.tile_buf<mat, Nxi32> 这种当前 verifier 认可的 syncall 形式,到了 lowering 会失败。

  1. P2 `tprefetch_async` 实际并不接受文档承诺的 `!pto.tensor_view` 输入 include/PTO/IR/PTOOps.td:603

新文档和 verifyAsyncFlatContiguous1DGMViewLike() 都把 !pto.tensor_view 列为 src 的合法输入,但 TPrefetchAsyncOp 的 ODS 仍然把 src 声明成 PTODpsType,而 PTODpsType 并不包含 TensorViewType。因此前端实际上无法构造文档承诺的 tprefetch_async(%tensor_view, %ctx) 形式,这是新的 API/文档契约不一致。

Signed-off-by: FangRui <fangrui_95@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants