CHORE: AGENTS.md added by prasadlohakpure · Pull Request #57 · patterninc/caterpillar

prasadlohakpure · 2026-04-06T05:32:46Z

Description

This pull request adds comprehensive documentation for the caterpillar Go CLI, including its architecture, usage, development conventions, and operational guidelines. The new AGENTS.md file provides onboarding instructions, directory structure, task extension steps, YAML pipeline examples, troubleshooting notes, and PR procedures.

Key documentation additions:

Project Overview and Usage

Describes the purpose of caterpillar, its batch/server modes, tech stack, build/run commands, and directory layout.
Explains how to add new task types, including code patterns, registration, and verification steps.

Pipeline Structure and Task Interface

Details the YAML pipeline structure for both linear and DAG (Directed Acyclic Graph) pipelines, including syntax and configuration options.
Documents the Task interface, conventions for task implementation, and context propagation.

Development and Operational Guidance

Lists known broken packages and test caveats to avoid build errors.
Provides smoke test instructions, SRE/operational model, environment variable requirements, and release caveats.

Pull Request and Agent Workflow

Outlines PR branch naming, title/body format, required checks, and agent permissions for safe repository contributions.

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

My code follows the code style of this project.
My change requires a change to the documentation and I have updated the documentation accordingly.
I have added tests to cover my changes.

Copilot

Pull request overview

Adds a new AGENTS.md document intended to onboard contributors to the caterpillar Go CLI, describing architecture, task development patterns, pipeline YAML structure (linear + DAG), and operational/PR workflow guidance.

Changes:

Introduces a comprehensive agent/contributor guide for building, running, and extending caterpillar tasks
Documents pipeline YAML syntax (including DAG mode) and task implementation conventions
Adds operational notes, smoke-test instructions, and PR process guidelines

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T05:36:56Z

+- **Batch mode** (most pipelines) — runs, processes all records, and exits.
+- **Server mode** — when the pipeline starts with an `http_server` task, the CLI acts as a long-running HTTP server. Incoming requests are converted to `*record.Record` values and emitted downstream to the rest of the pipeline. The binary does not exit until the server shuts down (or `end_after` is configured).
+
+Apply these instructions to `/Users/prasadlohakpure/Desktop/go_projects/src/github.com/patterninc/caterpillar`. Treat paths and commands below as relative to that location unless explicitly stated otherwise.


The doc hard-codes a developer-local absolute path ("/Users/prasadlohakpure/.../caterpillar") and tells readers to treat all paths as relative to it. This is not portable and also leaks a personal workstation path; please rephrase to refer to the repository root (e.g., "from the repo root") or omit the path entirely.

Suggested change

Apply these instructions to `/Users/prasadlohakpure/Desktop/go_projects/src/github.com/patterninc/caterpillar`. Treat paths and commands below as relative to that location unless explicitly stated otherwise.

Apply these instructions from the repository root. Treat paths and commands below as relative to the repository root unless explicitly stated otherwise.

Copilot · 2026-04-06T05:36:56Z

+- **Language:** Go 1.22+
+- **Config format:** YAML (gopkg.in/yaml.v3)
+- **Validation:** go-playground/validator/v10
+- **Testing:** standard `go test`
+- **CI:** GitHub Actions (`.github/workflows/ci.yaml`) — runs `go build ./cmd/caterpillar/caterpillar.go` on every PR to `main`; does not run tests
+


Tech stack lists "Go 1.22+", but the repo is pinned to Go 1.24.7 (go.mod "go 1.24.7" and CI uses 1.24.7). Please update this to match the actual required Go version so new contributors don't hit toolchain mismatches.

Copilot · 2026-04-06T05:36:56Z

+| `internal/pkg/pipeline/tasks.go` | Task registry (`supportedTasks` map) — add new task types here |
+| `internal/pkg/pipeline/task/` | `Task` interface and `Base` struct shared by all task packages |
+| `internal/pkg/pipeline/task/<name>/` | One directory per task type; each exports `New() (task.Task, error)` |
+| `internal/pkg/pipeline/task/aws/` | AWS-specific tasks (currently `parameter_store`) |


The directory map says AWS tasks are currently parameter_store, but the registered YAML task type key is aws_parameter_store (see internal/pkg/pipeline/tasks.go). Please align the doc with the actual type: value users must put in pipeline YAML.

Suggested change

| `internal/pkg/pipeline/task/aws/` | AWS-specific tasks (currently `parameter_store`) |

| `internal/pkg/pipeline/task/aws/` | AWS-specific tasks (currently YAML type `aws_parameter_store`) |

Copilot · 2026-04-06T05:36:56Z

+}
+```
+
+`task.Base` satisfies every method except `Run`. Only override a method if you need non-default behavior.


In the Task Interface section, it says task.Base satisfies every method except Run, but task.Base does implement Run (default pass-through) in internal/pkg/pipeline/task/task.go. Please fix this statement to avoid misleading task authors about what they must implement.

Suggested change

`task.Base` satisfies every method except `Run`. Only override a method if you need non-default behavior.

`task.Base` satisfies every method in this interface, including a default pass-through `Run`. Only override a method when you need behavior different from the default.

Copilot · 2026-04-06T05:36:57Z

+- `New()` signature is always `func() (task.Task, error)` — keep it consistent with the registry even if init cannot fail.
+- `Init()` is called once after unmarshaling, before `Run`. Override it only when you need one-time setup (e.g., creating a client). The `Base.Init()` no-op covers the default case.
+- No `input` channel: task is a **source** (generates records). No `output` channel: task is a **sink**. Both present: transformer.
+- `t.GetRecord(input)` safely reads from the channel and handles nil and close. `t.SendRecord(r, output)` evaluates `context:` JQ expressions and forwards the record.


This line claims t.GetRecord(input) "handles nil and close" (and earlier implies nil-record handling). In code, Base.GetRecord only returns (nil,false) when the channel itself is nil; if a nil record is sent on a non-nil channel it will return (nil,true). Please adjust wording to match the actual behavior (nil channel vs closed channel vs nil record value).

Suggested change

- `t.GetRecord(input)` safely reads from the channel and handles nil and close. `t.SendRecord(r, output)` evaluates `context:` JQ expressions and forwards the record.

- `t.GetRecord(input)` safely reads from the channel; it returns `ok == false` when the input channel is `nil` or closed. If a `nil` record is sent on a non-`nil`, open channel, it returns `r == nil` with `ok == true`. `t.SendRecord(r, output)` evaluates `context:` JQ expressions and forwards the record.

Copilot · 2026-04-06T05:36:57Z

+### 2. `internal/pkg/pipeline/task/file/` — production code is fine; test build fails
+
+- **Problem:** `file_success_path_test.go` (untracked) references `resolveSuccessObjectPath` and `writerSchemeFromPath`, which do not yet exist in the production package. The test was written ahead of the implementation.
+- **Status:** `file.go` and `s3.go` build and run correctly. Only the test build is broken.
+- **Action:** Do not run `go test ./internal/pkg/pipeline/task/file/`. The production package is safe to import and extend.
+
+### 3. Root package — scratch files with duplicate `main()` declarations
+
+- **Problem:** `push_sqs_localstack.go` and `push_kafka_message.go` both declare `package main` with a `func main()`, causing a duplicate-symbol error if you compile the root package.
+- **Status:** Untracked (not committed). Used locally for manual testing.
+- **Action:** Do not delete without asking. Do not attempt `go build .` at the repo root.
+


The "Known Broken Packages" section describes several paths/files as untracked/uncommitted (e.g., internal/pkg/pipeline/task/kinesis/, push_sqs_localstack.go, and file_success_path_test.go), but these do not exist in the repository tree. As written, this is misleading and discourages running go build ./... / go test ./... without a repo-backed reason; please either remove this section or rewrite it to only reference tracked files and current, reproducible build limitations.

Suggested change

### 2. `internal/pkg/pipeline/task/file/` — production code is fine; test build fails

- **Problem:** `file_success_path_test.go` (untracked) references `resolveSuccessObjectPath` and `writerSchemeFromPath`, which do not yet exist in the production package. The test was written ahead of the implementation.

- **Status:** `file.go` and `s3.go` build and run correctly. Only the test build is broken.

- **Action:** Do not run `go test ./internal/pkg/pipeline/task/file/`. The production package is safe to import and extend.

### 3. Root package — scratch files with duplicate `main()` declarations

- **Problem:** `push_sqs_localstack.go` and `push_kafka_message.go` both declare `package main` with a `func main()`, causing a duplicate-symbol error if you compile the root package.

- **Status:** Untracked (not committed). Used locally for manual testing.

- **Action:** Do not delete without asking. Do not attempt `go build .` at the repo root.

Copilot · 2026-04-06T05:36:57Z

+Caterpillar is a CLI tool, not a long-running server. There is no process to restart, no service to scale, and no health endpoint to query.
+
+**During an incident:**
+1. Check CI logs first: `gh run list --repo patterninc/caterpillar` then `gh run view <run-id>` to inspect a specific run.
+2. If CI is passing and runtime behavior is wrong, check the state of the relevant AWS service (S3 bucket access, SQS queue depth, SSM parameter existence) using the AWS Console or CLI.
+3. There is no caterpillar daemon to restart — re-running the binary with a corrected YAML is the recovery action.


The SRE/Operational section says "Caterpillar is a CLI tool, not a long-running server," but earlier the doc describes a supported "Server mode" via the http_server task where the process runs until shutdown. Please reconcile these statements (e.g., clarify it's usually batch/CLI, but can run long-lived when configured as an HTTP server).

Suggested change

Caterpillar is a CLI tool, not a long-running server. There is no process to restart, no service to scale, and no health endpoint to query.

**During an incident:**

1. Check CI logs first: `gh run list --repo patterninc/caterpillar` then `gh run view <run-id>` to inspect a specific run.

2. If CI is passing and runtime behavior is wrong, check the state of the relevant AWS service (S3 bucket access, SQS queue depth, SSM parameter existence) using the AWS Console or CLI.

3. There is no caterpillar daemon to restart — re-running the binary with a corrected YAML is the recovery action.

Caterpillar is usually operated as a CLI/batch tool, not as a continuously running service. Most pipelines run, process all records, and exit, so there is typically no service to scale and no health endpoint to query. However, when the pipeline starts with an `http_server` task, the same binary runs as a long-lived HTTP server until shutdown (or until `end_after` is reached).

**During an incident:**

1. Check CI logs first: `gh run list --repo patterninc/caterpillar` then `gh run view <run-id>` to inspect a specific run.

2. If CI is passing and runtime behavior is wrong, check the state of the relevant AWS service (S3 bucket access, SQS queue depth, SSM parameter existence) using the AWS Console or CLI.

3. For batch pipelines, re-running the binary with a corrected YAML is the recovery action. For `http_server` pipelines, treat it like a long-running process: inspect the running server configuration/logs and restart the process if needed.

Shaunak Sontakke (ssontakke) · 2026-04-06T08:03:07Z

+
+The binary can operate in two modes depending on the pipeline:
+- **Batch mode** (most pipelines) — runs, processes all records, and exits.
+- **Server mode** — when the pipeline starts with an `http_server` task, the CLI acts as a long-running HTTP server. Incoming requests are converted to `*record.Record` values and emitted downstream to the rest of the pipeline. The binary does not exit until the server shuts down (or `end_after` is configured).


Do we want to call it Caterpillar Service mode to have a common vocabulary for that?

Shaunak Sontakke (ssontakke) · 2026-04-06T08:05:26Z

+
+## Tech Stack
+
+- **Language:** Go 1.22+


We can instruct it to refer the go version from the mod file so this doesn't become absolute when someone updates the version?

Shaunak Sontakke (ssontakke) · 2026-04-06T08:07:02Z

+```bash
+go build ./cmd/caterpillar/caterpillar.go
+```
+


Step 5 for testing.
Step 6 for updating documentation.

Refer this to get some idea and incorporate those here - https://github.com/patterninc/caterpillar/pull/47/changes

Shaunak Sontakke (ssontakke) · 2026-04-06T08:11:03Z

+### Before opening a PR
+
+1. Build must pass: `go build ./cmd/caterpillar/caterpillar.go`
+2. If you added a new task type, confirm it is registered in `tasks.go` and has a `README.md` in its package directory.


Unit test is a must.

Shaunak Sontakke (ssontakke) · 2026-04-06T08:14:51Z

+**Do NOT run `go build ./...`** — three packages currently fail to build (see Known Broken Packages).
+
+## Directory Map
+


Directory structure is adopted from https://github.com/golang-standards/project-layout

Shaunak Sontakke (ssontakke) · 2026-04-06T08:16:05Z

+| `docs/` | Onboarding checklist, code-mint framework docs, outcomes, skills-status |
+| `.agents/` | code-mint AI infrastructure (skills, rules, reports, status JSON) |


We need to create these folders.

Shaunak Sontakke (ssontakke)

Good initiative.

Shaunak Sontakke (ssontakke) · 2026-04-06T08:17:18Z

+
+```bash
+go build ./cmd/caterpillar/caterpillar.go
+```


Also add linter, formatting instructions if we have.

CHORE: AGENTS.md added

b185a7b

prasadlohakpure requested a review from a team as a code owner April 6, 2026 05:32

Copilot AI review requested due to automatic review settings April 6, 2026 05:32

Copilot started reviewing on behalf of prasadlohakpure April 6, 2026 05:33 View session

Copilot AI reviewed Apr 6, 2026

View reviewed changes

Shaunak Sontakke (ssontakke) reviewed Apr 6, 2026

View reviewed changes

Merge branch 'main' into chore/agents_md

fce408c

	Apply these instructions to `/Users/prasadlohakpure/Desktop/go_projects/src/github.com/patterninc/caterpillar`. Treat paths and commands below as relative to that location unless explicitly stated otherwise.
	Apply these instructions from the repository root. Treat paths and commands below as relative to the repository root unless explicitly stated otherwise.

	\| `internal/pkg/pipeline/task/aws/` \| AWS-specific tasks (currently `parameter_store`) \|
	\| `internal/pkg/pipeline/task/aws/` \| AWS-specific tasks (currently YAML type `aws_parameter_store`) \|

	`task.Base` satisfies every method except `Run`. Only override a method if you need non-default behavior.
	`task.Base` satisfies every method in this interface, including a default pass-through `Run`. Only override a method when you need behavior different from the default.

	- `t.GetRecord(input)` safely reads from the channel and handles nil and close. `t.SendRecord(r, output)` evaluates `context:` JQ expressions and forwards the record.
	- `t.GetRecord(input)` safely reads from the channel; it returns `ok == false` when the input channel is `nil` or closed. If a `nil` record is sent on a non-`nil`, open channel, it returns `r == nil` with `ok == true`. `t.SendRecord(r, output)` evaluates `context:` JQ expressions and forwards the record.

		Do NOT run `go build ./...` — three packages currently fail to build (see Known Broken Packages).

		## Directory Map

		\| `docs/` \| Onboarding checklist, code-mint framework docs, outcomes, skills-status \|
		\| `.agents/` \| code-mint AI infrastructure (skills, rules, reports, status JSON) \|

Conversation

prasadlohakpure commented Apr 6, 2026

Description

Types of changes

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shaunak Sontakke (ssontakke) left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants