Skip to content

Commit 26945dd

Browse files
100yenadminEva
andauthored
agents: GPT-5.4 runtime completion rollup (openclaw#65219)
* agents: auto-activate strict-agentic for GPT-5 and emit blocked-exit liveness Closes two hard blockers on the GPT-5.4 parity completion gate: 1) Criterion 1 (no stalls after planning) is universal, but the pre-existing strict-agentic execution contract was opt-in only. Out-of-the-box GPT-5 openai / openai-codex users who never set `agents.defaults.embeddedPi.executionContract` still got only 1 planning-only retry and then fell through to the normal completion path with the plan-only text, i.e. they still stalled. Introduce `resolveEffectiveExecutionContract(...)` in src/agents/execution-contract.ts. Behavior: - supported provider/model (openai or openai-codex + gpt-5-family) AND explicit "strict-agentic" or unspecified → "strict-agentic" - supported provider/model AND explicit "default" → "default" (opt-out) - unsupported provider/model → "default" regardless of explicit value `isStrictAgenticExecutionContractActive` now delegates to the effective resolver so the 2-retry + blocked-state treatment applies by default to every GPT-5 openai/codex run. Explicit opt-out still works for users who intentionally want the pre-parity-program behavior. 2) Criterion 4 (replay/liveness failures are explicit, not silent disappearance) is violated by the strict-agentic blocked exit itself. Every other terminal return path in src/agents/pi-embedded-runner/run.ts sets `replayInvalid` + `livenessState` via `setTerminalLifecycleMeta`, but the strict-agentic exit at run.ts:1615 falls through without them. Add explicit `livenessState: "abandoned"` + `replayInvalid` (via the shared `resolveReplayInvalidForAttempt` helper) to that exit, plus a `setTerminalLifecycleMeta` call so downstream observers (lifecycle log, ACP bridge, telemetry) see the same explicit terminal state they see on every other exit branch. Regressions added: - `auto-enables update_plan for unconfigured GPT-5 openai runs` - `respects explicit default contract opt-out on GPT-5 runs` - `does not auto-enable update_plan for non-openai providers even when unconfigured` - `emits explicit replayInvalid + abandoned liveness state at the strict-agentic blocked exit` - `auto-activates strict-agentic for unconfigured GPT-5 openai runs and surfaces the blocked state` - `respects explicit default contract opt-out on GPT-5 openai runs` Local validation: - pnpm test src/agents/openclaw-tools.update-plan.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts src/agents/system-prompt.test.ts src/agents/openclaw-tools.sessions.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.test.ts 122/122 passing. Refs openclaw#64227 * agents: address loop-6 review comments on strict-agentic contract Triages all three loop-6 review comments on PR openclaw#64679: 1. Copilot: 'The strict-agentic blocked exit returns an error payload (isError: true) but sets livenessState to "abandoned". Elsewhere in the runner/lifecycle flow, error terminal states are treated as "blocked".' Verified: every other hardcoded error terminal branch in run.ts (role ordering at 1152, image size at 1206, schema error at 1244, compaction timeout at 1128, aborted-with-no-payloads at 606) uses livenessState: "blocked". Match that convention at the strict-agentic blocked exit at 1634. Updated the 'emits explicit replayInvalid + abandoned liveness state' regression test to assert the new "blocked" value and renamed the assertion commentary. 2. Copilot: 'The JSDoc for resolveEffectiveExecutionContract says explicit "strict-agentic" in config always resolves to "strict-agentic", but the implementation collapses to "default" whenever the provider/mode is unsupported.' Rewrite the JSDoc to explicitly document the unsupported-provider collapse as the lead case (strict-agentic is a GPT-5-family openai/openai-codex-only runtime contract) before listing the supported-lane behavior matrix. No code change; this is a docstring-only clarification. 3. Greptile P2: 'Non-preferred Anthropic model constant. CLAUDE.md says to prefer sonnet-4.6 for Anthropic test constants.' Swap claude-opus-4-6 → claude-sonnet-4-6 in the two update_plan gating fixtures that assert non-openai providers don't auto-enable the planning tool. Behavior unchanged; model constant now matches repo testing guidance. Local validation: - pnpm test src/agents/openclaw-tools.update-plan.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts 29/29 passing. Refs openclaw#64227 * test: rename strict-agentic blocked-exit liveness regression to match blocked state Addresses loop-7 Copilot finding on PR openclaw#64679: loop 6 changed the assertion to livenessState === 'blocked' to match the rest of the hard-error terminal branches in run.ts, but the test title still said 'abandoned liveness state', which made failures and test output misleading. Rename the test title to match the asserted value. No code change beyond the it(...) title. Validation: pnpm test src/agents/pi-embedded-runner/run.incomplete-turn.test.ts (19/19 pass). Refs openclaw#64227 * agents: widen strict-agentic auto-activation to handle prefixed and variant GPT-5 model ids * Align strict-agentic retry matching * runtime: harden strict-agentic model matching --------- Co-authored-by: Eva <eva@100yen.org>
1 parent b429379 commit 26945dd

6 files changed

Lines changed: 538 additions & 16 deletions

File tree

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
import { describe, expect, it } from "vitest";
2+
import type { OpenClawConfig } from "../config/types.openclaw.js";
3+
import {
4+
isStrictAgenticExecutionContractActive,
5+
resolveEffectiveExecutionContract,
6+
} from "./execution-contract.js";
7+
8+
describe("resolveEffectiveExecutionContract", () => {
9+
const supportedProvider = "openai";
10+
const unsupportedProvider = "anthropic";
11+
const emptyConfig: OpenClawConfig = {};
12+
13+
describe("supported provider + model detection", () => {
14+
it("auto-activates on bare gpt-5 model ids", () => {
15+
expect(
16+
resolveEffectiveExecutionContract({
17+
config: emptyConfig,
18+
provider: supportedProvider,
19+
modelId: "gpt-5.4",
20+
}),
21+
).toBe("strict-agentic");
22+
});
23+
24+
it("auto-activates on gpt-5o and variants without a separator", () => {
25+
for (const modelId of ["gpt-5", "gpt-5o", "gpt-5o-mini"]) {
26+
expect(
27+
resolveEffectiveExecutionContract({
28+
config: emptyConfig,
29+
provider: supportedProvider,
30+
modelId,
31+
}),
32+
).toBe("strict-agentic");
33+
}
34+
});
35+
36+
it("auto-activates on dot-separated variants", () => {
37+
for (const modelId of ["gpt-5.0", "gpt-5.4", "gpt-5.4-alt", "gpt-5.99"]) {
38+
expect(
39+
resolveEffectiveExecutionContract({
40+
config: emptyConfig,
41+
provider: supportedProvider,
42+
modelId,
43+
}),
44+
).toBe("strict-agentic");
45+
}
46+
});
47+
48+
it("auto-activates on dash-separated variants", () => {
49+
for (const modelId of ["gpt-5-preview", "gpt-5-turbo", "gpt-5-2025-03"]) {
50+
expect(
51+
resolveEffectiveExecutionContract({
52+
config: emptyConfig,
53+
provider: supportedProvider,
54+
modelId,
55+
}),
56+
).toBe("strict-agentic");
57+
}
58+
});
59+
60+
it("auto-activates on prefixed model ids (openai/gpt-5.4, openai:gpt-5.4)", () => {
61+
// Regression for the adversarial review finding: prefixed model ids
62+
// must strip the provider prefix before matching the regex.
63+
for (const modelId of [
64+
"openai/gpt-5.4",
65+
"openai:gpt-5.4",
66+
"openai/gpt-5o-mini",
67+
"openai-codex/gpt-5.4",
68+
"openai-codex:gpt-5.4",
69+
" openai/gpt-5.4 ",
70+
" OPENAI:GPT-5.4 ",
71+
]) {
72+
expect(
73+
resolveEffectiveExecutionContract({
74+
config: emptyConfig,
75+
provider: supportedProvider,
76+
modelId,
77+
}),
78+
).toBe("strict-agentic");
79+
}
80+
});
81+
82+
it("is case-insensitive", () => {
83+
for (const modelId of ["GPT-5.4", "Gpt-5O", "OPENAI/GPT-5.4"]) {
84+
expect(
85+
resolveEffectiveExecutionContract({
86+
config: emptyConfig,
87+
provider: supportedProvider,
88+
modelId,
89+
}),
90+
).toBe("strict-agentic");
91+
}
92+
});
93+
94+
it("does not match non-gpt-5 family ids", () => {
95+
for (const modelId of [
96+
"gpt-4.5",
97+
"gpt-4o",
98+
"gpt-6",
99+
"gpt-50",
100+
"claude-opus-4-6",
101+
"llama-3-70b",
102+
"mistral-large",
103+
]) {
104+
expect(
105+
resolveEffectiveExecutionContract({
106+
config: emptyConfig,
107+
provider: supportedProvider,
108+
modelId,
109+
}),
110+
).toBe("default");
111+
}
112+
});
113+
114+
it("collapses to default on unsupported providers even with gpt-5 model ids", () => {
115+
expect(
116+
resolveEffectiveExecutionContract({
117+
config: emptyConfig,
118+
provider: unsupportedProvider,
119+
modelId: "gpt-5.4",
120+
}),
121+
).toBe("default");
122+
});
123+
});
124+
125+
describe("explicit override behavior", () => {
126+
it("honors explicit strict-agentic on the supported lane", () => {
127+
const config: OpenClawConfig = {
128+
agents: {
129+
defaults: {
130+
embeddedPi: {
131+
executionContract: "strict-agentic",
132+
},
133+
},
134+
},
135+
};
136+
expect(
137+
resolveEffectiveExecutionContract({
138+
config,
139+
provider: supportedProvider,
140+
modelId: "gpt-5.4",
141+
}),
142+
).toBe("strict-agentic");
143+
});
144+
145+
it("honors explicit default opt-out even on the supported lane", () => {
146+
const config: OpenClawConfig = {
147+
agents: {
148+
defaults: {
149+
embeddedPi: {
150+
executionContract: "default",
151+
},
152+
},
153+
},
154+
};
155+
expect(
156+
resolveEffectiveExecutionContract({
157+
config,
158+
provider: supportedProvider,
159+
modelId: "gpt-5.4",
160+
}),
161+
).toBe("default");
162+
});
163+
164+
it("collapses explicit strict-agentic to default on an unsupported lane", () => {
165+
const config: OpenClawConfig = {
166+
agents: {
167+
defaults: {
168+
embeddedPi: {
169+
executionContract: "strict-agentic",
170+
},
171+
},
172+
},
173+
};
174+
expect(
175+
resolveEffectiveExecutionContract({
176+
config,
177+
provider: unsupportedProvider,
178+
modelId: "claude-opus-4-6",
179+
}),
180+
).toBe("default");
181+
});
182+
});
183+
184+
describe("active flag helper", () => {
185+
it("returns true when the effective contract is strict-agentic", () => {
186+
expect(
187+
isStrictAgenticExecutionContractActive({
188+
config: emptyConfig,
189+
provider: supportedProvider,
190+
modelId: "openai/gpt-5.4",
191+
}),
192+
).toBe(true);
193+
});
194+
195+
it("returns false when the effective contract is default", () => {
196+
expect(
197+
isStrictAgenticExecutionContractActive({
198+
config: emptyConfig,
199+
provider: supportedProvider,
200+
modelId: "gpt-4.5",
201+
}),
202+
).toBe(false);
203+
});
204+
});
205+
});

src/agents/execution-contract.ts

Lines changed: 104 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,120 @@ import type { OpenClawConfig } from "../config/types.openclaw.js";
22
import { normalizeLowercaseStringOrEmpty } from "../shared/string-coerce.js";
33
import { resolveAgentExecutionContract, resolveSessionAgentIds } from "./agent-scope.js";
44

5-
export function isStrictAgenticExecutionContractActive(params: {
5+
/**
6+
* Strip any leading `provider/` or `provider:` prefix from a model id so the
7+
* bare-name regex matching below works against `openai/gpt-5.4` and
8+
* `openai:gpt-5.4` the same way it does against `gpt-5.4`. Returns the bare
9+
* model id lowercased for comparison.
10+
*
11+
* Without this, auto-activation silently failed on prefixed model ids — a
12+
* user who configured `model: "openai/gpt-5.4"` in their agent config would
13+
* get the pre-PR-H looser default behavior because the regex only matched
14+
* bare names. The adversarial review in #64227 flagged this as a quality
15+
* gap on completion-gate criterion 1.
16+
*/
17+
function stripProviderPrefix(modelId: string): string {
18+
const normalizedModelId = modelId.trim();
19+
const match = /^([^/:]+)[/:](.+)$/.exec(normalizedModelId);
20+
return (match?.[2] ?? normalizedModelId).toLowerCase();
21+
}
22+
23+
/**
24+
* Regex that matches the full set of GPT-5 variants the strict-agentic
25+
* contract should auto-activate for. Intentionally permissive: every
26+
* model id in the gpt-5 family should opt in by default, not just the
27+
* canonical `gpt-5.4`.
28+
*
29+
* Covers:
30+
* - `gpt-5`, `gpt-5o`, `gpt-5o-mini` (no separator after `5`)
31+
* - `gpt-5.4`, `gpt-5.4-alt`, `gpt-5.0` (dot separator)
32+
* - `gpt-5-preview`, `gpt-5-turbo`, `gpt-5-2025-03` (dash separator)
33+
*
34+
* Does NOT cover `gpt-4.5`, `gpt-6`, or any non-gpt-5 family member.
35+
*/
36+
const STRICT_AGENTIC_MODEL_ID_PATTERN = /^gpt-5(?:[.o-]|$)/i;
37+
38+
/**
39+
* Supported provider + model combinations where strict-agentic is the intended
40+
* runtime contract. Kept as a narrow helper so both the execution-contract
41+
* resolver and the `update_plan` auto-enable gate converge on the same
42+
* definition of "GPT-5-family openai/openai-codex run".
43+
*/
44+
export function isStrictAgenticSupportedProviderModel(params: {
45+
provider?: string | null;
46+
modelId?: string | null;
47+
}): boolean {
48+
const provider = normalizeLowercaseStringOrEmpty(params.provider ?? "");
49+
if (provider !== "openai" && provider !== "openai-codex") {
50+
return false;
51+
}
52+
const modelId = typeof params.modelId === "string" ? params.modelId : "";
53+
const bareModelId = stripProviderPrefix(modelId);
54+
return STRICT_AGENTIC_MODEL_ID_PATTERN.test(bareModelId);
55+
}
56+
57+
/**
58+
* Returns the effective execution contract for an embedded Pi run.
59+
*
60+
* strict-agentic is a GPT-5-family openai/openai-codex-only runtime contract,
61+
* so an unsupported provider/model pair always collapses to `"default"`
62+
* regardless of what the caller passed or what config says — the contract
63+
* is inert off-provider. Within the supported lane, the behavior matrix is:
64+
*
65+
* - Supported provider/model + explicit `"strict-agentic"` in config
66+
* (defaults or per-agent override) ⇒ `"strict-agentic"`.
67+
* - Supported provider/model + explicit `"default"` in config ⇒ `"default"`
68+
* (opt-out honored).
69+
* - Supported provider/model + unspecified ⇒ `"strict-agentic"` so the
70+
* no-stall completion-gate criterion applies to out-of-the-box GPT-5 runs
71+
* without requiring every user to set the flag.
72+
* - Unsupported provider/model (anything that is not openai or openai-codex
73+
* with a gpt-5-family model id) ⇒ `"default"`, even when the config
74+
* explicitly sets `"strict-agentic"`. The retry guard and blocked-exit
75+
* helpers all check this lane again, so an explicit `"strict-agentic"`
76+
* on an unsupported lane is a no-op rather than a hard failure.
77+
*
78+
* This means explicit opt-out still works, but the gate criterion
79+
* "GPT-5.4 no longer stalls after planning" now covers unconfigured
80+
* installations, not only users who opted in manually.
81+
*/
82+
export function resolveEffectiveExecutionContract(params: {
683
config?: OpenClawConfig;
784
sessionKey?: string;
885
agentId?: string | null;
986
provider?: string | null;
1087
modelId?: string | null;
11-
}): boolean {
88+
}): "default" | "strict-agentic" {
1289
const { sessionAgentId } = resolveSessionAgentIds({
1390
sessionKey: params.sessionKey,
1491
config: params.config,
1592
agentId: params.agentId ?? undefined,
1693
});
17-
if (resolveAgentExecutionContract(params.config, sessionAgentId) !== "strict-agentic") {
18-
return false;
94+
const explicit = resolveAgentExecutionContract(params.config, sessionAgentId);
95+
// strict-agentic is a GPT-5-family openai/openai-codex runtime contract
96+
// regardless of whether it was set explicitly or auto-activated. On an
97+
// unsupported provider/model pair the contract is inert either way, so
98+
// the effective value collapses to "default".
99+
const supported = isStrictAgenticSupportedProviderModel({
100+
provider: params.provider,
101+
modelId: params.modelId,
102+
});
103+
if (!supported) {
104+
return "default";
19105
}
20-
const provider = normalizeLowercaseStringOrEmpty(params.provider ?? "");
21-
if (provider !== "openai" && provider !== "openai-codex") {
22-
return false;
106+
if (explicit === "default") {
107+
return "default";
23108
}
24-
return /^gpt-5(?:[.-]|$)/i.test(params.modelId?.trim() ?? "");
109+
// Explicit strict-agentic OR unspecified-but-supported → strict-agentic.
110+
return "strict-agentic";
111+
}
112+
113+
export function isStrictAgenticExecutionContractActive(params: {
114+
config?: OpenClawConfig;
115+
sessionKey?: string;
116+
agentId?: string | null;
117+
provider?: string | null;
118+
modelId?: string | null;
119+
}): boolean {
120+
return resolveEffectiveExecutionContract(params) === "strict-agentic";
25121
}

0 commit comments

Comments
 (0)