Global chat job code by hanna-paasivirta · Pull Request #495 · OpenFn/apollo

hanna-paasivirta · 2026-05-20T17:55:40Z

Short Description

Testing improvements and bug fixes for a set of user scenarios targeting job code edits relating to just the step that the user is viewing (i.e. not several steps, not another step that the user is not viewing, and not the workflow structure).

Fixes #490

Implementation Details

New acceptance specs for global_chat

Added one_shot_workflows/ specs exercising the planner path on from-scratch workflow requests of varying ambiguity.
Added job_code/ specs (8 scenarios) exercising the job-code path on existing workflows — first-turn questions, multi-turn conversations, log-driven debugging, "what does this step do", and code edits with varying history depth.

Architecture / planning docs

Added agent-team-architecture-plan/ with the design write-ups for the four test tiers and an example spec format, so future contributors know where unit / service / integration / acceptance tests should live.

Bug fix in planner job-code stitching

Fixed services/global_chat/planner.py: when the planner's job_key needed fuzzy-matching to find the YAML job, the original key was still being passed to stitch_job_code (which requires exact match), so the stitch silently failed and code
edits were lost. Now captures the matched key and uses it for stitching. Affected both single and parallel job-code-tool execution paths.

Prompt fixes

Prompt tweak in global_chat/prompts.yaml to firm up router step selection

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

josephjclark · 2026-05-27T14:48:59Z

+
+# notes
+
+Reproduces a real session in which the assistant returned only "..." after ~19s. User is on the update-mailchimp step (adaptor `@openfn/language-mailchimp@latest`) of a multi-step workflow that also has Google Sheets and Gmail steps. Prior conversation generated the Mailchimp campaign code and explained the spread operator. The current user turn is an open-ended teach-me request that names "the hardest function for this adaptor" — phrasing is ambiguous and the model's behaviour here has historically been unreliable.


Definitely think that acceptance tests isn't the place for this sort of thing. It's an integration test.

But it's ok for now since we don't have integration tests . We'll review later

josephjclark · 2026-05-27T14:53:03Z

+
+## content
+
+What does the cursor() call at the top of this job actually do? I didn't write it and I'm not sure why it's there.


when I'm reviewing these tests it often feels useful to have the question be the first thing in the file

josephjclark · 2026-05-27T15:03:56Z

+
+# notes
+
+Multi-turn conversation entirely about the same job code. The user has already asked two clarifying questions about the upsert step in a Salesforce contact sync and now asks a follow-up that requires keeping prior context in mind ("you mentioned external IDs — show me how to pick one"). The router should send this to job_code_agent. The response should pick up the thread from the prior assistant message rather than restarting the explanation. Inline code snippets in chat text are fine; the thing to avoid is generating a `code_edits` / `job_code` attachment that renders as an intrusive diff in the editor.


the thing to avoid is generating a code_edits / job_code attachment that renders as an intrusive diff in the editor.

is that right? The user has asked for an edit - why are we telling the model not to send an attachment?

I ask because this test is failing locally for me

hanna-paasivirta added 20 commits May 12, 2026 01:40

add plan

fab5c59

add base acceptance architecture

d37c579

move qualitative tests

3f7711e

rewrite to md without asserts

6b5ac26

udpate model usage and readme

d149e86

clean

b2c02c2

add judges

de27a1c

edit judges for service prompt consistency

c62acf0

specify judge

22a1f97

fix prefix

7a5c272

add sumary

254619b

parallel judge

ff0bf52

add oneshot folder

23841d0

fix prompt and router step selection

36fe0b5

add job code tests

6777002

fix job key matching

942aaee

Merge remote-tracking branch 'origin/main' into global-chat-job-code

dc814e1

adjust tests and job chat prompt

3ed80de

add test for unsolved bug

07f91a8

revert prompt change discouraging unsolicited code

cb051ca

hanna-paasivirta marked this pull request as ready for review May 21, 2026 16:21

hanna-paasivirta mentioned this pull request May 21, 2026

Job_chat: Model outputs an empty or partial response #497

Open

josephjclark reviewed May 27, 2026

View reviewed changes

josephjclark added 2 commits May 27, 2026 16:14

changeset

33d00be

version

e702cfb

josephjclark merged commit 5f1c144 into main May 27, 2026
2 checks passed

josephjclark deleted the global-chat-job-code branch May 27, 2026 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global chat job code#495

Global chat job code#495
josephjclark merged 22 commits into
mainfrom
global-chat-job-code

hanna-paasivirta commented May 20, 2026 •

edited

Loading

Uh oh!

josephjclark May 27, 2026

Uh oh!

josephjclark May 27, 2026

Uh oh!

josephjclark May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		# notes

		Reproduces a real session in which the assistant returned only "..." after ~19s. User is on the update-mailchimp step (adaptor `@openfn/language-mailchimp@latest`) of a multi-step workflow that also has Google Sheets and Gmail steps. Prior conversation generated the Mailchimp campaign code and explained the spread operator. The current user turn is an open-ended teach-me request that names "the hardest function for this adaptor" — phrasing is ambiguous and the model's behaviour here has historically been unreliable.


		## content

		What does the cursor() call at the top of this job actually do? I didn't write it and I'm not sure why it's there.


		# notes

		Multi-turn conversation entirely about the same job code. The user has already asked two clarifying questions about the upsert step in a Salesforce contact sync and now asks a follow-up that requires keeping prior context in mind ("you mentioned external IDs — show me how to pick one"). The router should send this to job_code_agent. The response should pick up the thread from the prior assistant message rather than restarting the explanation. Inline code snippets in chat text are fine; the thing to avoid is generating a `code_edits` / `job_code` attachment that renders as an intrusive diff in the editor.

Conversation

hanna-paasivirta commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short Description

Implementation Details

AI Usage

Uh oh!

josephjclark May 27, 2026

Choose a reason for hiding this comment

Uh oh!

josephjclark May 27, 2026

Choose a reason for hiding this comment

Uh oh!

josephjclark May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hanna-paasivirta commented May 20, 2026 •

edited

Loading