docs: add PTOAS usability evaluation skill by HecreReed · Pull Request #682 · hw-native-sys/PTOAS

HecreReed · 2026-05-19T02:20:59Z

Summary

add a PTOAS-specific usability evaluation skill
keep a tool-neutral source copy under skills/ptoas-usability-eval/
add direct discovery entrypoints for Codex, Cursor, Trae, and Claude Code
add a layered evaluation model so documentation review, local minimal execution, Linux compile-only, and NPU board runs are scored separately
scope the default evaluation to 01 算子复现部署 and the PTOAS-supported subset of 04 算子基本功能实现
add reference guides for scope mapping, evidence collection, layer gating, and scoring rules

Directory Layout

tool-neutral source: skills/ptoas-usability-eval/
Codex entrypoint: .codex/skills/ptoas-usability-eval/
Cursor entrypoint: .cursor/skills/ptoas-usability-eval/
Trae entrypoint: .trae/skills/ptoas-usability-eval/
Claude Code entrypoint: .claude/skills/ptoas-usability-eval/

Why

PTOAS does not cleanly map to all six operator usability scenarios. This skill constrains evaluation to the parts the repo can actually evidence from its own docs, scripts, samples, and CI configuration, and marks unsupported areas as N/A instead of forcing misleading scores.

The layered model is important because bisheng and CANN compile-only belong to a Linux+CANN environment, while NPU board validation belongs to a device-equipped server environment. The skill now requires evaluators to declare the covered layer first and marks higher layers as 未实测 instead of treating a missing local environment as a PTOAS usability failure.

Included content

SKILL.md: trigger conditions, workflow, output shape, scope boundaries, and layer gating
references/scope.md: scenario mapping, exclusions, and evaluation layers
references/evidence-checklist.md: canonical repo evidence sources, search order, and layer mapping
references/metrics-01.md: scoring guidance for 01 算子复现部署
references/metrics-04.md: scoring guidance for the PTOAS-supported subset of 04
agents/openai.yaml: UI metadata where applicable
README.md: explains the neutral source copy and per-client entrypoints

Notes

no production code changes
no build/test impact

gemini-code-assist

Code Review

This pull request introduces a comprehensive usability evaluation framework for the PTOAS repository, including a new skill definition, detailed metrics for operator reproduction and implementation, and an evidence checklist. The review feedback suggests refining the skill description for brevity and removing redundant search commands in the evidence checklist to improve reliability and clarity.

gemini-code-assist · 2026-05-19T02:24:33Z

@@ -0,0 +1,62 @@
+---
+name: ptoas-usability-eval
+description: Evaluate PTOAS repository usability for operator reproduction/deployment and the PTOAS-supported subset of basic operator implementation. Use PTOAS repo docs, scripts, samples, and CI config as evidence; score scene 01 as the primary template, score only the build/run/validation subset of scene 04, and mark unsupported scene 02/03/05/06 items as N/A.


此 description 字段内容过长，更像是一段摘要。根据 YAML frontmatter 的惯例，description 字段通常是更简洁的单行描述，而将详细说明放在文档正文中。建议将其缩短以提高可读性和规范性。

Suggested change

description: Evaluate PTOAS repository usability for operator reproduction/deployment and the PTOAS-supported subset of basic operator implementation. Use PTOAS repo docs, scripts, samples, and CI config as evidence; score scene 01 as the primary template, score only the build/run/validation subset of scene 04, and mark unsupported scene 02/03/05/06 items as N/A.

description: Evaluate PTOAS repository usability for operator reproduction/deployment (scene 01) and a subset of basic implementation (scene 04).

gemini-code-assist · 2026-05-19T02:24:33Z

+
+```bash
+rg -n "构建|运行测试|上板验证|compile-only|generate_testcase|run_remote_npu_validation" README.md docs test .github
+rg --files test/samples


rg --files test/samples 命令与下一行的 find 命令存在功能重叠，且可能引起混淆。

rg --files 依赖于 Git 仓库状态（只列出被追踪的文件），在某些评估环境中可能不稳定。

rg --files 会递归列出所有文件，可能会包含过多无关信息。相比之下，find 命令通过 -maxdepth 2 更有针对性。

为了让指令更清晰、可靠，建议移除此行，仅依靠更明确的 find 命令来发现示例文件。

reedhecre · 2026-05-19T02:25:26Z

Codex Review

该评论由 review 机器人自动更新。

PR: docs: add PTOAS usability evaluation skill #682 docs: add PTOAS usability evaluation skill
Author: HecreReed
Base/Head: main / codex/ptoas-usability-eval-skill
Head SHA: ae0852ece606
Trigger: PR 有新提交
Generated At: 2026-05-19T12:21:06Z
Previous Head SHA: a9441432821f
Status: failed at codex-review (exit=1)

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings，因为 review 过程提前失败。

Log Tail

 .trae/skills/ptoas-usability-eval/SKILL.md         |  96 ++++++++++++++
 .../skills/ptoas-usability-eval/agents/openai.yaml |   4 +
 .../references/evidence-checklist.md               |  86 ++++++++++++
 .../ptoas-usability-eval/references/metrics-01.md  | 110 +++++++++++++++
 .../ptoas-usability-eval/references/metrics-02.md  |  68 ++++++++++
 .../ptoas-usability-eval/references/metrics-04.md  | 107 +++++++++++++++
 .../ptoas-usability-eval/references/metrics-05.md  |  73 ++++++++++
 .../ptoas-usability-eval/references/metrics-06.md  |  75 +++++++++++
 .../ptoas-usability-eval/references/scope.md       | 129 ++++++++++++++++++
 .../ptoas-usability-eval/references/scoring.md     | 147 +++++++++++++++++++++
 skills/ptoas-usability-eval/README.md              |  28 ++++
 skills/ptoas-usability-eval/SKILL.md               |  96 ++++++++++++++
 skills/ptoas-usability-eval/agents/openai.yaml     |   4 +
 .../references/evidence-checklist.md               |  86 ++++++++++++
 .../ptoas-usability-eval/references/metrics-01.md  | 110 +++++++++++++++
 .../ptoas-usability-eval/references/metrics-02.md  |  68 ++++++++++
 .../ptoas-usability-eval/references/metrics-04.md  | 107 +++++++++++++++
 .../ptoas-usability-eval/references/metrics-05.md  |  73 ++++++++++
 .../ptoas-usability-eval/references/metrics-06.md  |  75 +++++++++++
 skills/ptoas-usability-eval/references/scope.md    | 129 ++++++++++++++++++
 skills/ptoas-usability-eval/references/scoring.md  | 147 +++++++++++++++++++++
 55 files changed, 4615 insertions(+)
===== END STAGE clone rc=0 @ 2026-05-19 20:20:37 =====

===== STAGE codex-review @ 2026-05-19 20:20:37 =====
set -euo pipefail
cd '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/repo'
'codex' exec -C '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/repo' -s read-only -c 'model_provider="codereview"' -c 'model="gpt-5.4"' -c 'model_reasoning_effort="xhigh"' --output-schema '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/review_schema.json' -o '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/codex_last_message.json' --color never - < '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/review_prompt.txt'
OpenAI Codex v0.115.0 (research preview)
--------
workdir: /tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/repo
model: gpt-5.4
provider: codereview
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none
session id: 019e402e-5417-7251-86d0-733c6c34501c
--------
user
你现在在审查 GitHub PR。

仓库：hw-native-sys/PTOAS
PR：#682 docs: add PTOAS usability evaluation skill
作者：HecreReed
base branch：origin/main
head branch：HEAD（当前已 checkout 到 PR head）

要求：
1. 只审查这个 PR 相对 origin/main 的改动，必要时可以看上下文文件。
2. 重点找真实的 correctness / regression / contract mismatch / CI / runtime / compatibility 问题。
3. 不要提纯风格建议，不要提低价值猜测。
4. 严格按优先级输出：
   - P1：高概率会导致错误结果、编译/运行失败、严重回归、发布阻断
   - P2：重要缺陷、行为回归、遗漏校验/测试、较大兼容性问题
   - P3：次要但明确可改的问题
5. 如果没有问题，summary 直接写：未检查到 PR #682 存在问题，并返回 findings=[]。
6. 如果有问题，summary 简洁概括，findings 里每条都要给出：
   - severity
   - title
   - body（说明为什么是问题，尽量具体）
   - file（尽量给相对路径）
   - line（能确定就填整数，否则 null）

建议先查看：
- git status --short
- git diff --stat origin/main...HEAD
- git diff --unified=80 origin/main...HEAD

最终输出必须严格匹配 JSON schema。

mcp startup: no servers
Reconnecting... 1/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: e53161f4-00a4-4ace-a3b9-67314028cc7c)
Reconnecting... 2/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 0dc89700-97a4-4923-a0f2-fc0ecec07a66)
Reconnecting... 3/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 9de7370e-db26-4c1d-b276-dc27e8affb74)
Reconnecting... 4/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 24251512-88e3-463e-9ce8-9f17575aa8e4)
Reconnecting... 5/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 4ad38bf5-0168-47a0-a2d0-b9c4909b9d0c)
ERROR: unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: ca39f9e1-5045-4e54-86b7-93aa070cc8b9
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/codex_last_message.json
===== END STAGE codex-review rc=1 @ 2026-05-19 20:21:06 =====

docs: add PTOAS usability evaluation skill

b76509a

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

HecreReed added 5 commits May 19, 2026 10:39

docs: add multi-client PTOAS skill entrypoints

dd6995d

docs: add layered PTOAS usability evaluation model

9525316

Expand PTOAS usability skill for 02/05/06

a944143

Add aggregate scoring rules to PTOAS usability skill

101a45d

Clarify measured-score wording in usability skill

ae0852e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add PTOAS usability evaluation skill#682

docs: add PTOAS usability evaluation skill#682
HecreReed wants to merge 6 commits into
hw-native-sys:mainfrom
HecreReed:codex/ptoas-usability-eval-skill

HecreReed commented May 19, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

reedhecre commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	description: Evaluate PTOAS repository usability for operator reproduction/deployment and the PTOAS-supported subset of basic operator implementation. Use PTOAS repo docs, scripts, samples, and CI config as evidence; score scene 01 as the primary template, score only the build/run/validation subset of scene 04, and mark unsupported scene 02/03/05/06 items as N/A.
	description: Evaluate PTOAS repository usability for operator reproduction/deployment (scene 01) and a subset of basic implementation (scene 04).

Conversation

HecreReed commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Directory Layout

Why

Included content

Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

reedhecre commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codex Review

Summary

Findings

Log Tail

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HecreReed commented May 19, 2026 •

edited

Loading

reedhecre commented May 19, 2026 •

edited

Loading