Skip to content

Global chat job code#495

Merged
josephjclark merged 22 commits into
mainfrom
global-chat-job-code
May 27, 2026
Merged

Global chat job code#495
josephjclark merged 22 commits into
mainfrom
global-chat-job-code

Conversation

@hanna-paasivirta
Copy link
Copy Markdown
Contributor

@hanna-paasivirta hanna-paasivirta commented May 20, 2026

Short Description

Testing improvements and bug fixes for a set of user scenarios targeting job code edits relating to just the step that the user is viewing (i.e. not several steps, not another step that the user is not viewing, and not the workflow structure).

Fixes #490

Implementation Details

New acceptance specs for global_chat

  • Added one_shot_workflows/ specs exercising the planner path on from-scratch workflow requests of varying ambiguity.
  • Added job_code/ specs (8 scenarios) exercising the job-code path on existing workflows — first-turn questions, multi-turn conversations, log-driven debugging, "what does this step do", and code edits with varying history depth.

Architecture / planning docs

  • Added agent-team-architecture-plan/ with the design write-ups for the four test tiers and an example spec format, so future contributors know where unit / service / integration / acceptance tests should live.

Bug fix in planner job-code stitching

  • Fixed services/global_chat/planner.py: when the planner's job_key needed fuzzy-matching to find the YAML job, the original key was still being passed to stitch_job_code (which requires exact match), so the stitch silently failed and code
    edits were lost. Now captures the matched key and uses it for stitching. Affected both single and parallel job-code-tool execution paths.

Prompt fixes

  • Prompt tweak in global_chat/prompts.yaml to firm up router step selection

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy


# notes

Reproduces a real session in which the assistant returned only "..." after ~19s. User is on the update-mailchimp step (adaptor `@openfn/language-mailchimp@latest`) of a multi-step workflow that also has Google Sheets and Gmail steps. Prior conversation generated the Mailchimp campaign code and explained the spread operator. The current user turn is an open-ended teach-me request that names "the hardest function for this adaptor" — phrasing is ambiguous and the model's behaviour here has historically been unreliable.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely think that acceptance tests isn't the place for this sort of thing. It's an integration test.

But it's ok for now since we don't have integration tests . We'll review later


## content

What does the cursor() call at the top of this job actually do? I didn't write it and I'm not sure why it's there.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I'm reviewing these tests it often feels useful to have the question be the first thing in the file


# notes

Multi-turn conversation entirely about the same job code. The user has already asked two clarifying questions about the upsert step in a Salesforce contact sync and now asks a follow-up that requires keeping prior context in mind ("you mentioned external IDs — show me how to pick one"). The router should send this to job_code_agent. The response should pick up the thread from the prior assistant message rather than restarting the explanation. Inline code snippets in chat text are fine; the thing to avoid is generating a `code_edits` / `job_code` attachment that renders as an intrusive diff in the editor.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing to avoid is generating a code_edits / job_code attachment that renders as an intrusive diff in the editor.

is that right? The user has asked for an edit - why are we telling the model not to send an attachment?

I ask because this test is failing locally for me

@josephjclark josephjclark merged commit 5f1c144 into main May 27, 2026
2 checks passed
@josephjclark josephjclark deleted the global-chat-job-code branch May 27, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Global agent: Job code edits relating to a single step

2 participants