Skip to content

docs: add PTOAS usability evaluation skill#682

Draft
HecreReed wants to merge 6 commits into
hw-native-sys:mainfrom
HecreReed:codex/ptoas-usability-eval-skill
Draft

docs: add PTOAS usability evaluation skill#682
HecreReed wants to merge 6 commits into
hw-native-sys:mainfrom
HecreReed:codex/ptoas-usability-eval-skill

Conversation

@HecreReed
Copy link
Copy Markdown
Collaborator

@HecreReed HecreReed commented May 19, 2026

Summary

  • add a PTOAS-specific usability evaluation skill
  • keep a tool-neutral source copy under skills/ptoas-usability-eval/
  • add direct discovery entrypoints for Codex, Cursor, Trae, and Claude Code
  • add a layered evaluation model so documentation review, local minimal execution, Linux compile-only, and NPU board runs are scored separately
  • scope the default evaluation to 01 算子复现部署 and the PTOAS-supported subset of 04 算子基本功能实现
  • add reference guides for scope mapping, evidence collection, layer gating, and scoring rules

Directory Layout

  • tool-neutral source: skills/ptoas-usability-eval/
  • Codex entrypoint: .codex/skills/ptoas-usability-eval/
  • Cursor entrypoint: .cursor/skills/ptoas-usability-eval/
  • Trae entrypoint: .trae/skills/ptoas-usability-eval/
  • Claude Code entrypoint: .claude/skills/ptoas-usability-eval/

Why

PTOAS does not cleanly map to all six operator usability scenarios. This skill constrains evaluation to the parts the repo can actually evidence from its own docs, scripts, samples, and CI configuration, and marks unsupported areas as N/A instead of forcing misleading scores.

The layered model is important because bisheng and CANN compile-only belong to a Linux+CANN environment, while NPU board validation belongs to a device-equipped server environment. The skill now requires evaluators to declare the covered layer first and marks higher layers as 未实测 instead of treating a missing local environment as a PTOAS usability failure.

Included content

  • SKILL.md: trigger conditions, workflow, output shape, scope boundaries, and layer gating
  • references/scope.md: scenario mapping, exclusions, and evaluation layers
  • references/evidence-checklist.md: canonical repo evidence sources, search order, and layer mapping
  • references/metrics-01.md: scoring guidance for 01 算子复现部署
  • references/metrics-04.md: scoring guidance for the PTOAS-supported subset of 04
  • agents/openai.yaml: UI metadata where applicable
  • README.md: explains the neutral source copy and per-client entrypoints

Notes

  • no production code changes
  • no build/test impact

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive usability evaluation framework for the PTOAS repository, including a new skill definition, detailed metrics for operator reproduction and implementation, and an evidence checklist. The review feedback suggests refining the skill description for brevity and removing redundant search commands in the evidence checklist to improve reliability and clarity.

@@ -0,0 +1,62 @@
---
name: ptoas-usability-eval
description: Evaluate PTOAS repository usability for operator reproduction/deployment and the PTOAS-supported subset of basic operator implementation. Use PTOAS repo docs, scripts, samples, and CI config as evidence; score scene 01 as the primary template, score only the build/run/validation subset of scene 04, and mark unsupported scene 02/03/05/06 items as N/A.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

description 字段内容过长,更像是一段摘要。根据 YAML frontmatter 的惯例,description 字段通常是更简洁的单行描述,而将详细说明放在文档正文中。建议将其缩短以提高可读性和规范性。

Suggested change
description: Evaluate PTOAS repository usability for operator reproduction/deployment and the PTOAS-supported subset of basic operator implementation. Use PTOAS repo docs, scripts, samples, and CI config as evidence; score scene 01 as the primary template, score only the build/run/validation subset of scene 04, and mark unsupported scene 02/03/05/06 items as N/A.
description: Evaluate PTOAS repository usability for operator reproduction/deployment (scene 01) and a subset of basic implementation (scene 04).


```bash
rg -n "构建|运行测试|上板验证|compile-only|generate_testcase|run_remote_npu_validation" README.md docs test .github
rg --files test/samples
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

rg --files test/samples 命令与下一行的 find 命令存在功能重叠,且可能引起混淆。

  • rg --files 依赖于 Git 仓库状态(只列出被追踪的文件),在某些评估环境中可能不稳定。
  • rg --files 会递归列出所有文件,可能会包含过多无关信息。相比之下,find 命令通过 -maxdepth 2 更有针对性。

为了让指令更清晰、可靠,建议移除此行,仅依靠更明确的 find 命令来发现示例文件。

@reedhecre
Copy link
Copy Markdown

reedhecre commented May 19, 2026

Codex Review

该评论由 review 机器人自动更新。

  • PR: docs: add PTOAS usability evaluation skill #682 docs: add PTOAS usability evaluation skill
  • Author: HecreReed
  • Base/Head: main / codex/ptoas-usability-eval-skill
  • Head SHA: ae0852ece606
  • Trigger: PR 有新提交
  • Generated At: 2026-05-19T12:21:06Z
  • Previous Head SHA: a9441432821f
  • Status: failed at codex-review (exit=1)

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings,因为 review 过程提前失败。

Log Tail

 .trae/skills/ptoas-usability-eval/SKILL.md         |  96 ++++++++++++++
 .../skills/ptoas-usability-eval/agents/openai.yaml |   4 +
 .../references/evidence-checklist.md               |  86 ++++++++++++
 .../ptoas-usability-eval/references/metrics-01.md  | 110 +++++++++++++++
 .../ptoas-usability-eval/references/metrics-02.md  |  68 ++++++++++
 .../ptoas-usability-eval/references/metrics-04.md  | 107 +++++++++++++++
 .../ptoas-usability-eval/references/metrics-05.md  |  73 ++++++++++
 .../ptoas-usability-eval/references/metrics-06.md  |  75 +++++++++++
 .../ptoas-usability-eval/references/scope.md       | 129 ++++++++++++++++++
 .../ptoas-usability-eval/references/scoring.md     | 147 +++++++++++++++++++++
 skills/ptoas-usability-eval/README.md              |  28 ++++
 skills/ptoas-usability-eval/SKILL.md               |  96 ++++++++++++++
 skills/ptoas-usability-eval/agents/openai.yaml     |   4 +
 .../references/evidence-checklist.md               |  86 ++++++++++++
 .../ptoas-usability-eval/references/metrics-01.md  | 110 +++++++++++++++
 .../ptoas-usability-eval/references/metrics-02.md  |  68 ++++++++++
 .../ptoas-usability-eval/references/metrics-04.md  | 107 +++++++++++++++
 .../ptoas-usability-eval/references/metrics-05.md  |  73 ++++++++++
 .../ptoas-usability-eval/references/metrics-06.md  |  75 +++++++++++
 skills/ptoas-usability-eval/references/scope.md    | 129 ++++++++++++++++++
 skills/ptoas-usability-eval/references/scoring.md  | 147 +++++++++++++++++++++
 55 files changed, 4615 insertions(+)
===== END STAGE clone rc=0 @ 2026-05-19 20:20:37 =====

===== STAGE codex-review @ 2026-05-19 20:20:37 =====
set -euo pipefail
cd '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/repo'
'codex' exec -C '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/repo' -s read-only -c 'model_provider="codereview"' -c 'model="gpt-5.4"' -c 'model_reasoning_effort="xhigh"' --output-schema '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/review_schema.json' -o '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/codex_last_message.json' --color never - < '/tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/review_prompt.txt'
OpenAI Codex v0.115.0 (research preview)
--------
workdir: /tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/repo
model: gpt-5.4
provider: codereview
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none
session id: 019e402e-5417-7251-86d0-733c6c34501c
--------
user
你现在在审查 GitHub PR。

仓库:hw-native-sys/PTOAS
PR:#682 docs: add PTOAS usability evaluation skill
作者:HecreReed
base branch:origin/main
head branch:HEAD(当前已 checkout 到 PR head)

要求:
1. 只审查这个 PR 相对 origin/main 的改动,必要时可以看上下文文件。
2. 重点找真实的 correctness / regression / contract mismatch / CI / runtime / compatibility 问题。
3. 不要提纯风格建议,不要提低价值猜测。
4. 严格按优先级输出:
   - P1:高概率会导致错误结果、编译/运行失败、严重回归、发布阻断
   - P2:重要缺陷、行为回归、遗漏校验/测试、较大兼容性问题
   - P3:次要但明确可改的问题
5. 如果没有问题,summary 直接写:未检查到 PR #682 存在问题,并返回 findings=[]。
6. 如果有问题,summary 简洁概括,findings 里每条都要给出:
   - severity
   - title
   - body(说明为什么是问题,尽量具体)
   - file(尽量给相对路径)
   - line(能确定就填整数,否则 null)

建议先查看:
- git status --short
- git diff --stat origin/main...HEAD
- git diff --unified=80 origin/main...HEAD

最终输出必须严格匹配 JSON schema。

mcp startup: no servers
Reconnecting... 1/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: e53161f4-00a4-4ace-a3b9-67314028cc7c)
Reconnecting... 2/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 0dc89700-97a4-4923-a0f2-fc0ecec07a66)
Reconnecting... 3/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 9de7370e-db26-4c1d-b276-dc27e8affb74)
Reconnecting... 4/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 24251512-88e3-463e-9ce8-9f17575aa8e4)
Reconnecting... 5/5 (unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: 4ad38bf5-0168-47a0-a2d0-b9c4909b9d0c)
ERROR: unexpected status 502 Bad Gateway: Upstream service temporarily unavailable, url: https://codex.0u0o.com/responses, request id: ca39f9e1-5045-4e54-86b7-93aa070cc8b9
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260519_202032_pr682/codex_last_message.json
===== END STAGE codex-review rc=1 @ 2026-05-19 20:21:06 =====

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants