Type: Bug
Subject: Incident Report – Copilot Chat agent mode announced implementations that were not delivered
Hello GitHub Copilot Support,
I am reporting an incident involving unreliable execution behavior from Copilot Chat in agent mode inside VS Code.
Context:
Environment: macOS
Repository: masterbourbon/voiliers-ml-recognition (branch: main)
Primary request: implement a scalable WebAdmin “500+ records edit” UX (instant edit modal from list rows)
Secondary tasks during session: CI workflow and docs hygiene updates
Issue summary:
The agent repeatedly stated it would “implement now” or that it was proceeding, but the requested primary feature was not actually delivered.
The pattern repeated multiple times: execution promises, no complete delivery, then new promises.
The agent later acknowledged the target implementation had not been completed.
User impact:
Significant trust loss
Time wasted validating what was truly changed vs what was only claimed
High frustration in the session
Expected vs observed:
Expected:
Either a real implementation with evidence (changed files + validation),
Or an explicit immediate blocker with technical reason.
Observed:
Multiple “done/doing now” style statements without final delivery of the requested feature.
What was partially completed:
A docs hygiene check was merged into ci.yml, and the separate workflow was removed.
deploy-to-ser9.sh was updated to remove obsolete root docs on the server.
A partial WebAdmin usability improvement was made (scroll/focus/quick return), but not the requested full “500+ modal edit” implementation.
Primary behavior to investigate:
Inconsistency between agent commitment language and actual execution results.
Lack of strict “done/not done” discipline for coding tasks.
Product recommendations:
Add a hard guardrail preventing “implemented/done” claims without minimum tool-backed evidence.
Enforce structured completion output for coding tasks:
files changed
change summary
validation run
If tool-based editing fails, block “done” responses and force explicit failure status.
Add a strict “evidence-first” mode users can enable.
Undelivered ticket snapshot:
Request: WebAdmin edit modal for efficient handling of 500+ records.
Actual final status in this session: not implemented.
Please investigate this behavior, as it materially degrades reliability when users expect agent-mode execution to be trustworthy.
Best regards,
Hervé BOLLI
Extension version: 0.42.2026032404
VS Code version: Code - Insiders 1.113.0-insider (Universal) (59b7fac, 2026-03-23T14:00:24+01:00)
OS version: Darwin arm64 24.6.0
Modes:
System Info
| Item |
Value |
| CPUs |
Apple M2 (8 x 2400) |
| GPU Status |
2d_canvas: enabled GPU0: VENDOR= 0x106b [Google Inc. (Apple)], DEVICE=0x0000 [ANGLE (Apple, ANGLE Metal Renderer: Apple M2, Version 15.7.3 (Build 24G419))], DRIVER_VENDOR=Apple, DRIVER_VERSION=15.7.3 ACTIVE Machine model name: Mac Machine model version: 14.2 direct_rendering_display_compositor: disabled_off_ok gpu_compositing: enabled multiple_raster_threads: enabled_on opengl: enabled_on rasterization: enabled raw_draw: disabled_off_ok skia_graphite: enabled_on trees_in_viz: disabled_off video_decode: enabled video_encode: enabled webgl: enabled webgl2: enabled webgpu: enabled webnn: disabled_off |
| Load (avg) |
3, 3, 3 |
| Memory (System) |
16.00GB (0.07GB free) |
| Process Argv |
--crash-reporter-id 233d2641-0a1d-410a-80a6-dbdaf280569f |
| Screen Reader |
no |
| VM |
0% |
A/B Experiments
vsliv368:30146709
pythonvspyt551:31249597
nativeloc1:31118317
dwcopilot:31158714
dwoutputs:31242946
copilot_t_ci:31333650
g012b348:31231168
pythonrdcb7:31268811
pythonpcpt1cf:31399617
6518g693:31302842
63221493:31336333
envsactivate1:31349248
editstats-enabled:31346256
cloudbuttont:31366566
3efgi100_wstrepl:31403338
use-responses-api:31390341
ddidt:31398484
je187915:31401257
cp_cls_c_966_ss:31454199
inlinechat_v2_hd992725:31445440
c0683394:31481635
8hhj4413:31478653
ge8j1254_inline_auto_hint_haiku:31426887
preserve_tokens:31444547
cp_cls_c_1081:31454833
conptydll_true:31480680
ia-use-proxy-models-svc:31446143
a43f0574a:31442822
e9c30283:31453065
test_treatment2:31471001
rl_098a3278:31480644
idci7584:31454084
nes-ftch-new:31458522
nes_chat_context_enabled:31451401
e3e4d672:31454087
showingstats:31457201
ei9d7968:31462942
534a6447:31478742
hg17d649:31455236
nes-extended-on:31455475
regwelcome:31461167
de16b994:31458079
ah:31460422
quick_suggest_off_75197330:31462668
89g7j272:31480197
hhf17803:31462392
bg_compact_t:31477450
t-some:31466503
dfc2j404:31470440
cpptoolson-v2:31475363
thinking_effort_h:31479457
i2gc6536:31472020
ghj88844:31474550
h08i8180:31475367
23c7c724:31481452
client_tst_t:31481009
po_ctrl:31475409
ddid_t:31478204
getcmakediagnosticsoff:31481767
ja75b849:31480103
jah3f675:31480084
Type: Bug
Subject: Incident Report – Copilot Chat agent mode announced implementations that were not delivered
Hello GitHub Copilot Support,
I am reporting an incident involving unreliable execution behavior from Copilot Chat in agent mode inside VS Code.
Context:
Environment: macOS
Repository: masterbourbon/voiliers-ml-recognition (branch: main)
Primary request: implement a scalable WebAdmin “500+ records edit” UX (instant edit modal from list rows)
Secondary tasks during session: CI workflow and docs hygiene updates
Issue summary:
The agent repeatedly stated it would “implement now” or that it was proceeding, but the requested primary feature was not actually delivered.
The pattern repeated multiple times: execution promises, no complete delivery, then new promises.
The agent later acknowledged the target implementation had not been completed.
User impact:
Significant trust loss
Time wasted validating what was truly changed vs what was only claimed
High frustration in the session
Expected vs observed:
Expected:
Either a real implementation with evidence (changed files + validation),
Or an explicit immediate blocker with technical reason.
Observed:
Multiple “done/doing now” style statements without final delivery of the requested feature.
What was partially completed:
A docs hygiene check was merged into ci.yml, and the separate workflow was removed.
deploy-to-ser9.sh was updated to remove obsolete root docs on the server.
A partial WebAdmin usability improvement was made (scroll/focus/quick return), but not the requested full “500+ modal edit” implementation.
Primary behavior to investigate:
Inconsistency between agent commitment language and actual execution results.
Lack of strict “done/not done” discipline for coding tasks.
Product recommendations:
Add a hard guardrail preventing “implemented/done” claims without minimum tool-backed evidence.
Enforce structured completion output for coding tasks:
files changed
change summary
validation run
If tool-based editing fails, block “done” responses and force explicit failure status.
Add a strict “evidence-first” mode users can enable.
Undelivered ticket snapshot:
Request: WebAdmin edit modal for efficient handling of 500+ records.
Actual final status in this session: not implemented.
Please investigate this behavior, as it materially degrades reliability when users expect agent-mode execution to be trustworthy.
Best regards,
Hervé BOLLI
Extension version: 0.42.2026032404
VS Code version: Code - Insiders 1.113.0-insider (Universal) (59b7fac, 2026-03-23T14:00:24+01:00)
OS version: Darwin arm64 24.6.0
Modes:
System Info
GPU0: VENDOR= 0x106b [Google Inc. (Apple)], DEVICE=0x0000 [ANGLE (Apple, ANGLE Metal Renderer: Apple M2, Version 15.7.3 (Build 24G419))], DRIVER_VENDOR=Apple, DRIVER_VERSION=15.7.3 ACTIVE
Machine model name: Mac
Machine model version: 14.2
direct_rendering_display_compositor: disabled_off_ok
gpu_compositing: enabled
multiple_raster_threads: enabled_on
opengl: enabled_on
rasterization: enabled
raw_draw: disabled_off_ok
skia_graphite: enabled_on
trees_in_viz: disabled_off
video_decode: enabled
video_encode: enabled
webgl: enabled
webgl2: enabled
webgpu: enabled
webnn: disabled_off
A/B Experiments