feat: cancel deployment by chronark · Pull Request #5274 · unkeyed/unkey

chronark · 2026-03-10T07:06:55Z

stores the restate invocation id of a deployment in order to be able to cancel/kill it later

I did not review this yet

vercel · 2026-03-10T07:06:59Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
dashboard	Ready	Preview, Comment	Mar 10, 2026 7:09am

coderabbitai · 2026-03-10T07:07:17Z

📝 Walkthrough

Walkthrough

This pull request introduces a cancel deployment feature across the entire application stack. The changes add database support for tracking invocation IDs and a cancelled deployment status, extend the control service protobuf with a new CancelDeployment RPC, implement the corresponding backend handler, integrate Restate admin client for invocation termination, and update the frontend to display and trigger cancellation UI.

Changes

Cohort / File(s)	Summary
Database Schema & Models `pkg/db/schema.sql`, `pkg/db/models_generated.go`, `web/internal/db/src/schema/deployments.ts`	Added `invocation_id` varchar column to deployments table and introduced `DeploymentsStatusCancelled` enum value alongside existing deployment statuses.
Generated SQL Queries `pkg/db/deployment_find_by_id.sql_generated.go`, `pkg/db/deployment_find_by_k8s_name.sql_generated.go`, `pkg/db/deployment_list_by_environment_id_and_status.sql_generated.go`, `pkg/db/deployment_topology_list_by_versions.sql_generated.go`, `pkg/db/deployment_topology_list_desired.sql_generated.go`	Updated SELECT statements and row scanning to include `invocation_id` column in all deployment query results; added `InvocationID` field population across multiple generated query handlers.
Deployment Update Generated Code `pkg/db/queries/deployment_update_invocation_id.sql`, `pkg/db/deployment_update_invocation_id.sql_generated.go`, `pkg/db/querier_generated.go`, `pkg/db/BUILD.bazel`	New SQL query and generated Go code to update deployments' `invocation_id` and `updated_at` fields; added `UpdateDeploymentInvocationID` method to Querier interface and generated implementation.
Restate Admin Client `pkg/restate/admin/client.go`	Added `KillInvocation` method to terminate Restate invocations; refactored invocation termination logic into shared `terminateInvocation` helper supporting both cancel and kill modes; broadened success conditions to accept any 2xx status.
Control Service Protocol `svc/ctrl/proto/ctrl/v1/deployment.proto`	Added `DEPLOYMENT_STATUS_CANCELLED` enum value, introduced `CancelDeploymentRequest` and `CancelDeploymentResponse` messages, and added `CancelDeployment` RPC to DeployService.
Deployment Service Core `svc/ctrl/services/deployment/service.go`, `svc/ctrl/services/deployment/cancel_deployment.go`, `svc/ctrl/services/deployment/create_deployment.go`, `svc/ctrl/services/deployment/get_deployment.go`	Added `RestateAdmin` client field to Service config and initialization; implemented new `CancelDeployment` handler with status validation and invocation termination; updated deployment creation to persist invocation IDs to database; extended status conversion to handle cancelled state.
Deployment Build Configuration `svc/ctrl/services/deployment/BUILD.bazel`, `svc/ctrl/api/run.go`	Added `cancel_deployment.go` source file and `//pkg/restate/admin` dependency; wired `RestateAdmin` client into deployment service configuration.
Worker Workflows `svc/ctrl/worker/deploy/deploy_handler.go`, `svc/ctrl/worker/githubwebhook/handle_push.go`	Inserted two 30-second delays in Deploy workflow; modified webhook handler to capture and persist deployment invocation IDs from Restate workflow responses.
Control API `svc/api/routes/v2_deploy_get_deployment/handler.go`, `svc/api/openapi/...`, `svc/api/internal/testutil/mock_deployment_client.go`	Added handling for `DeploymentsStatusCancelled` status in response mapping; added `CANCELLED` to OpenAPI deployment status enums; extended mock deployment client with `CancelDeployment` method and tracking fields.
Logging Utilities `svc/frontline/services/router/BUILD.bazel`, `svc/frontline/services/router/service.go`	Added `//pkg/logger` dependency and implemented error logging with environment context in sentinel lookup failures.
Frontend Cancel Dialog Component `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx`	New React component rendering a confirmation dialog for deployment cancellation; integrates TRPC mutation, handles success/error states, and manages loading feedback via toast notifications.
Frontend Action Menu & Filters `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx`, `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.ts`	Added cancel action to deployment menu with conditional enablement based on deployment status; expanded `DEPLOYMENT_STATUSES` and `GROUPED_DEPLOYMENT_STATUSES` to include cancelled; updated filter color mapping and status expansion logic.
Frontend Status Display `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/deployment-status-badge.tsx`, `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/components/deployment-status-badge.tsx`	Added cancelled status badge configuration with Ban icon and gray styling to deployment status badge components; updated status type unions to include cancelled.
Frontend Deployment List & Details `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/deployments-list.tsx`, `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/domain_list.tsx`, `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/utils/get-row-class.ts`	Refactored terminal state logic to treat cancelled and failed as equivalent for resource display suppression and placeholder rendering; unified styling for both terminal statuses.
Frontend Deployment Progress `web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx`	Introduced `isCancelled` flag to distinguish cancelled from failed deployments; updated step descriptions and final messaging to reflect cancellation context; refined error banner styling and text based on actual failure versus cancellation.
Frontend TRPC Mutation & Router `web/apps/dashboard/lib/trpc/routers/deploy/deployment/cancel.ts`, `web/apps/dashboard/lib/trpc/routers/index.ts`	New cancel mutation procedure validating deployment access, checking cancellable statuses, invoking control service gRPC, applying rate limiting, and returning status; exposed cancel endpoint in deployment router.
Frontend Schema `web/apps/dashboard/lib/collections/deploy/deployments.ts`	Extended Deployment schema to include cancelled status value in the status enum.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is minimal and lacks critical required sections from the template, including issue reference, type of change selection, testing instructions, and checklist items, though it does convey the core intent.	Add missing template sections: reference an issue, select type of change, describe test procedures, and complete the required checklist items before merging.
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: cancel deployment' directly addresses the main feature being introduced throughout the changeset: the ability to cancel deployments via storing invocation IDs and implementing cancel logic.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cancel-deployment

📝 Coding Plan

Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Command failed

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

chronark · 2026-03-16T13:23:14Z

cries in merge conflicts

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.ts (1)
56-69: ⚠️ Potential issue | 🟡 Minor

ready still falls through to the default color.

Line 57 checks "completed", but this file exposes "ready" as the grouped terminal status. Ready filters will still render with bg-info-9 after this change.
💡 Minimal fix
-      if (value === "completed") {
+      if (value === "ready") {
         return "bg-success-9";
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.ts
around lines 56 - 69, getColorClass currently only checks for "completed" and
leaves the grouped terminal status "ready" falling through to the default;
update getColorClass to treat "ready" as a terminal/completed status (e.g.,
return the same "bg-success-9" as "completed") by adding a check for value ===
"ready" or combining it with the existing "completed" condition so ready filters
render with the correct green background.
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx (1)
21-53: ⚠️ Potential issue | 🟡 Minor

Cancelled deployments continue polling indefinitely.

Lines 28 and 38 refetch both queries every second unconditionally, but line 46 stops the local timer when isCancelled is true. Since cancelled deployments remain on this page without automatic redirect, the client will keep hitting both endpoints until the user navigates away.
Minimal fix
  const steps = trpc.deploy.deployment.steps.useQuery(
    {
      deploymentId: deployment.id,
    },
    {
-      refetchInterval: 1_000,
+      refetchInterval: isCancelled ? false : 1_000,
    },
  );
@@
  const buildSteps = trpc.deploy.deployment.buildSteps.useQuery(
    {
      deploymentId: deployment.id,
      includeStepLogs: true,
    },
    {
-      refetchInterval: 1_000,
+      refetchInterval: isCancelled ? false : 1_000,
    },
  );
This pattern is already used in the codebase (e.g., use-fetch-timeseries.ts).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx
around lines 21 - 53, The steps and buildSteps queries are refetching every
second unconditionally causing continued polling for cancelled deployments;
update the trpc.deploy.deployment.steps.useQuery and
trpc.deploy.deployment.buildSteps.useQuery calls to disable polling when
isCancelled (or equivalently add enabled: !isCancelled or set refetchInterval:
undefined when isCancelled) so they stop refetching for cancelled deployments,
and remove any reliance on the local now timer in the queries (leave the
useEffect for now display untouched) — target the steps and buildSteps query
option objects to conditionally set refetchInterval or enabled based on the
isCancelled boolean.

🧹 Nitpick comments (3)

web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx (1)
35-42: Don’t swallow mutation failures in handleCancel.

mutateAsync already routes errors through onError; the extra .catch only logs and suppresses explicit failure handling at the call site.
💡 Proposed fix
   const handleCancel = async () => {
-    await cancelDeployment
-      .mutateAsync({
-        deploymentId: selectedDeployment.id,
-      })
-      .catch((error) => {
-        console.error("Cancel error:", error);
-      });
+    await cancelDeployment.mutateAsync({
+      deploymentId: selectedDeployment.id,
+    });
   };
As per coding guidelines: "Use `fault` library for error handling".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx
around lines 35 - 42, The current handleCancel swallows errors by attaching a
.catch to cancelDeployment.mutateAsync; remove that .catch so errors can
propagate to the mutation's onError handler, and instead route failures through
the project's fault error-handling mechanism (e.g., wrap or capture the promise
with the fault library as per guidelines) so failures are reported but not
suppressed; update handleCancel to call cancelDeployment.mutateAsync({
deploymentId: selectedDeployment.id }) without the .catch and integrate fault
(using the project's fault API) to record/handle any thrown error.
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx (1)
40-48: Type the cancellable-status list as deployment statuses.

This currently relies on a plain string array; tightening it to Deployment["status"] avoids silent typos and keeps illegal states harder to represent.
💡 Proposed fix
-    const cancellableStatuses = [
+    const cancellableStatuses: ReadonlySet<Deployment["status"]> = new Set([
       "pending",
       "starting",
       "building",
       "deploying",
       "network",
       "finalizing",
-    ];
-    const canCancel = cancellableStatuses.includes(selectedDeployment.status);
+    ]);
+    const canCancel = cancellableStatuses.has(selectedDeployment.status);
As per coding guidelines: "Make illegal states unrepresentable by modeling domain with discriminated unions and parsing inputs at boundaries into typed structures".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx
around lines 40 - 48, The cancellableStatuses array is untyped and can silently
accept typos; change it to a typed array of the deployment status union (e.g.,
declare cancellableStatuses as Deployment["status"][] and import the Deployment
type) so the compiler enforces valid statuses, then keep the existing canCancel
check (const canCancel =
cancellableStatuses.includes(selectedDeployment.status)) to work with the typed
statuses; ensure the literal values exactly match members of
Deployment["status"] or adjust to use a const assertion if needed.
svc/ctrl/services/deployment/service.go (1)
33-45: Reject nil RestateAdmin in New.

CancelDeployment already treats a missing admin client as misconfiguration, but New still constructs Service in that state. Fail fast here so the service cannot start partially configured.

As per coding guidelines, "Make illegal states unrepresentable by modeling domain with ADTs/discriminated unions and parsing inputs at boundaries into typed structures".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@svc/ctrl/services/deployment/service.go` around lines 33 - 45, New currently
allows constructing a Service with a nil RestateAdmin, which defers a
configuration error to runtime; update New to validate cfg.RestateAdmin and fail
fast: check cfg.RestateAdmin for nil at the start of New and return a clear
fatal error (panic or log.Fatalf) with context naming RestateAdmin (or change
New's signature to return (*Service, error) and return an error) so the Service
cannot be constructed in a partially configured state; reference the New
function, Service struct and cfg.RestateAdmin when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/restate/admin/client.go`:
- Around line 105-125: The terminateInvocation function should reject empty
invocationID before constructing the URL and interpreting a 404 as success: add
a guard at the top of Client.terminateInvocation that returns a clear error
(e.g., fmt.Errorf or errors.New) when invocationID == "" so you never call c.do
or map a 404 to nil for a missing ID; keep the rest of the logic (URL formation,
response handling) unchanged and reference the same function name
terminateInvocation to locate where to add the check.

In `@svc/ctrl/services/deployment/create_deployment.go`:
- Around line 300-307: The code treats persisting the invocation ID
(db.Query.UpdateDeploymentInvocationID call using parameters ID=deploymentID and
InvocationID=sql.NullString{String: invocation.Id()}) as best-effort; instead
make persistence mandatory: after creating the workflow/invocation, if
UpdateDeploymentInvocationID returns an error, either retry the DB update a few
times with backoff or cancel/terminate the started invocation and return an
error to the caller so the deployment can be retried/rolled back. Concretely,
wrap the db.Query.UpdateDeploymentInvocationID call in a retry loop (with small
sleep/backoff) and only proceed/return success once it succeeds; if retries
exhaust, call the invocation cancel/terminate API (using invocation.Id()) and
return an error (no longer just logger.Warn) so the caller knows persistence
failed.

In `@svc/ctrl/worker/deploy/deploy_handler.go`:
- Around line 186-187: Replace the blocking time.Sleep call with a cancellable,
durable timer by using restate.Sleep(ctx, 30*time.Second) (or remove the sleep
if it was only for debugging); locate the blocking call in deploy_handler.go
(e.g., inside the deploy handler function where time.Sleep(30 * time.Second) is
used) and swap it to restate.Sleep(ctx, 30*time.Second) so the operation
respects the passed context and Restate retry/cancellation semantics.

In `@svc/ctrl/worker/githubwebhook/handle_push.go`:
- Around line 185-197: The persistence of invocationID (invocationID :=
inv.GetInvocationId()) must not be downgraded to a warn; if
UpdateDeploymentInvocationID (db.Query.UpdateDeploymentInvocationID) fails the
handler should surface/handle the error so the workflow retains a cancel handle.
Replace the logger.Warn path: make the call retryable or return the error from
the surrounding function (i.e., propagate the error instead of swallowing it).
Use a retry/backoff loop or the existing restate.RunVoid wrapper with retry
semantics (or wrap UpdateDeploymentInvocationID in a retry helper) and if
retries ultimately fail return an error (or abort the request) rather than
logging and continuing; keep references to invocationID, restate.RunVoid and
UpdateDeploymentInvocationID to locate the change.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx:
- Around line 18-24: The onSuccess handler currently calls utils.invalidate()
and fires collection.deployments.utils.refetch() without awaiting it, causing
racey UI; update the handler to scope invalidation/refetch to the deployments
query (use collection.deployments.utils.invalidate() or
collection.deployments.utils.refetch()) and await the promise(s) (await
collection.deployments.utils.refetch() and await any async invalidate call)
inside a try/catch before closing the dialog or showing success to ensure the
deployments list is refreshed when the UI updates.

In `@web/apps/dashboard/lib/trpc/routers/deploy/deployment/cancel.ts`:
- Around line 75-85: The current catch on ctrl.cancelDeployment({ deploymentId:
input.deploymentId }) swallows control-plane errors and converts everything to a
generic INTERNAL_SERVER_ERROR with message: err; instead, detect and rethrow the
original error when it already represents a TRPCError (or has a code-like
property) so NOT_FOUND/FAILED_PRECONDITION are preserved, and when wrapping is
necessary create a TRPCError using a safe string message (e.g., err.message or
String(err)) rather than the raw err object; update the catch around
ctrl.cancelDeployment to check for err instanceof TRPCError (or err.code) and
throw err directly, otherwise throw new TRPCError({ code:
"INTERNAL_SERVER_ERROR", message: err.message ?? String(err) }).

---

Outside diff comments:
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx:
- Around line 21-53: The steps and buildSteps queries are refetching every
second unconditionally causing continued polling for cancelled deployments;
update the trpc.deploy.deployment.steps.useQuery and
trpc.deploy.deployment.buildSteps.useQuery calls to disable polling when
isCancelled (or equivalently add enabled: !isCancelled or set refetchInterval:
undefined when isCancelled) so they stop refetching for cancelled deployments,
and remove any reliance on the local now timer in the queries (leave the
useEffect for now display untouched) — target the steps and buildSteps query
option objects to conditionally set refetchInterval or enabled based on the
isCancelled boolean.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.ts:
- Around line 56-69: getColorClass currently only checks for "completed" and
leaves the grouped terminal status "ready" falling through to the default;
update getColorClass to treat "ready" as a terminal/completed status (e.g.,
return the same "bg-success-9" as "completed") by adding a check for value ===
"ready" or combining it with the existing "completed" condition so ready filters
render with the correct green background.

---

Nitpick comments:
In `@svc/ctrl/services/deployment/service.go`:
- Around line 33-45: New currently allows constructing a Service with a nil
RestateAdmin, which defers a configuration error to runtime; update New to
validate cfg.RestateAdmin and fail fast: check cfg.RestateAdmin for nil at the
start of New and return a clear fatal error (panic or log.Fatalf) with context
naming RestateAdmin (or change New's signature to return (*Service, error) and
return an error) so the Service cannot be constructed in a partially configured
state; reference the New function, Service struct and cfg.RestateAdmin when
making the change.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx:
- Around line 35-42: The current handleCancel swallows errors by attaching a
.catch to cancelDeployment.mutateAsync; remove that .catch so errors can
propagate to the mutation's onError handler, and instead route failures through
the project's fault error-handling mechanism (e.g., wrap or capture the promise
with the fault library as per guidelines) so failures are reported but not
suppressed; update handleCancel to call cancelDeployment.mutateAsync({
deploymentId: selectedDeployment.id }) without the .catch and integrate fault
(using the project's fault API) to record/handle any thrown error.

In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx:
- Around line 40-48: The cancellableStatuses array is untyped and can silently
accept typos; change it to a typed array of the deployment status union (e.g.,
declare cancellableStatuses as Deployment["status"][] and import the Deployment
type) so the compiler enforces valid statuses, then keep the existing canCancel
check (const canCancel =
cancellableStatuses.includes(selectedDeployment.status)) to work with the typed
statuses; ensure the literal values exactly match members of
Deployment["status"] or adjust to use a const assertion if needed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a6fee186-d49e-47a4-ad8d-31a20d780ffd

📥 Commits

Reviewing files that changed from the base of the PR and between 25c27f5 and 04548e9.

⛔ Files ignored due to path filters (4)

gen/proto/ctrl/v1/ctrlv1connect/deployment.connect.go is excluded by !**/gen/**
gen/proto/ctrl/v1/deployment.pb.go is excluded by !**/*.pb.go, !**/gen/**
gen/rpc/ctrl/deployment_generated.go is excluded by !**/gen/**
web/apps/dashboard/gen/proto/ctrl/v1/deployment_pb.ts is excluded by !**/gen/**

📒 Files selected for processing (41)

pkg/db/BUILD.bazel
pkg/db/deployment_find_by_id.sql_generated.go
pkg/db/deployment_find_by_k8s_name.sql_generated.go
pkg/db/deployment_list_by_environment_id_and_status.sql_generated.go
pkg/db/deployment_topology_list_by_versions.sql_generated.go
pkg/db/deployment_topology_list_desired.sql_generated.go
pkg/db/deployment_update_invocation_id.sql_generated.go
pkg/db/models_generated.go
pkg/db/querier_generated.go
pkg/db/queries/deployment_update_invocation_id.sql
pkg/db/schema.sql
pkg/restate/admin/client.go
svc/api/internal/testutil/mock_deployment_client.go
svc/api/openapi/gen.go
svc/api/openapi/openapi-generated.yaml
svc/api/openapi/spec/paths/v2/deploy/getDeployment/V2DeployGetDeploymentResponseData.yaml
svc/api/routes/v2_deploy_get_deployment/handler.go
svc/ctrl/api/run.go
svc/ctrl/proto/ctrl/v1/deployment.proto
svc/ctrl/services/deployment/BUILD.bazel
svc/ctrl/services/deployment/cancel_deployment.go
svc/ctrl/services/deployment/create_deployment.go
svc/ctrl/services/deployment/get_deployment.go
svc/ctrl/services/deployment/service.go
svc/ctrl/worker/deploy/deploy_handler.go
svc/ctrl/worker/githubwebhook/handle_push.go
svc/frontline/services/router/BUILD.bazel
svc/frontline/services/router/service.go
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/deployment-status-badge.tsx
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/domain_list.tsx
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/deployments-list.tsx
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/utils/get-row-class.ts
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.ts
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/components/deployment-status-badge.tsx
web/apps/dashboard/lib/collections/deploy/deployments.ts
web/apps/dashboard/lib/trpc/routers/deploy/deployment/cancel.ts
web/apps/dashboard/lib/trpc/routers/index.ts
web/internal/db/src/schema/deployments.ts

pkg/restate/admin/client.go

coderabbitai · 2026-03-16T13:45:58Z

svc/ctrl/services/deployment/cancel_deployment.go

+	// Cancel the Restate invocation if we have an invocation ID
+	if dep.InvocationID.Valid && dep.InvocationID.String != "" {
+		if s.restateAdmin == nil {
+			return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("restate admin client is not configured"))
+		}
+
+		if killErr := s.restateAdmin.KillInvocation(ctx, dep.InvocationID.String); killErr != nil {
+			logger.Warn("failed to kill deployment invocation",
+				"deployment_id", deploymentID,
+				"invocation_id", dep.InvocationID.String,
+				"error", killErr,
+			)
+			// Continue with status update even if kill fails
+		} else {
+			logger.Info("killed deployment invocation",
+				"deployment_id", deploymentID,
+				"invocation_id", dep.InvocationID.String,
+			)
+		}
+	} else {
+		logger.Warn("no invocation ID stored for deployment, updating status only",
+			"deployment_id", deploymentID,
+		)
+	}
+
+	// Mark the deployment as cancelled
+	err = db.Query.UpdateDeploymentStatus(ctx, s.db.RW(), db.UpdateDeploymentStatusParams{
+		ID:        deploymentID,
+		Status:    db.DeploymentsStatusCancelled,
+		UpdatedAt: sql.NullInt64{Valid: true, Int64: time.Now().UnixMilli()},
+	})
+	if err != nil {
+		return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to update deployment status: %w", err))


⚠️ Potential issue | 🔴 Critical

Only write cancelled after a real, current cancel.

Both the killErr / missing-invocation path and the stale-read race still fall through to UpdateDeploymentStatus. That can mark a deployment cancelled even though its workflow kept running or already reached a terminal state. Make the status write conditional on successful termination and the deployment still being in a cancellable status.

coderabbitai · 2026-03-16T13:45:58Z

svc/ctrl/services/deployment/create_deployment.go

+	// Store invocation ID for later cancellation
+	if updateErr := db.Query.UpdateDeploymentInvocationID(ctx, s.db.RW(), db.UpdateDeploymentInvocationIDParams{
+		ID:           deploymentID,
+		InvocationID: sql.NullString{Valid: true, String: invocation.Id()},
+		UpdatedAt:    sql.NullInt64{Valid: true, Int64: time.Now().UnixMilli()},
+	}); updateErr != nil {
+		logger.Warn("failed to store invocation ID", "deployment_id", deploymentID, "error", updateErr)
+	}


⚠️ Potential issue | 🟠 Major

Don't treat invocation-id persistence as best-effort.

If Lines 301-307 fail, the workflow is already running but this deployment can no longer be cancelled/killed later—the core capability added in this PR. This needs a retry or compensating path, not just a warning.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@svc/ctrl/services/deployment/create_deployment.go` around lines 300 - 307, The code treats persisting the invocation ID (db.Query.UpdateDeploymentInvocationID call using parameters ID=deploymentID and InvocationID=sql.NullString{String: invocation.Id()}) as best-effort; instead make persistence mandatory: after creating the workflow/invocation, if UpdateDeploymentInvocationID returns an error, either retry the DB update a few times with backoff or cancel/terminate the started invocation and return an error to the caller so the deployment can be retried/rolled back. Concretely, wrap the db.Query.UpdateDeploymentInvocationID call in a retry loop (with small sleep/backoff) and only proceed/return success once it succeeds; if retries exhaust, call the invocation cancel/terminate API (using invocation.Id()) and return an error (no longer just logger.Warn) so the caller knows persistence failed.

coderabbitai · 2026-03-16T13:45:58Z

svc/ctrl/worker/deploy/deploy_handler.go

+	time.Sleep(30 * time.Second)
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Restate Go SDK durable workflows, what is the recommended way to delay execution? Is using time.Sleep inside a workflow/object handler supported, and what are the replay/cancellation implications?

💡 Result:

Recommended way to delay execution (Go SDK)

Use Restate’s durable timers, not Go’s time.Sleep:

restate.Sleep(ctx, d) to pause a handler durably. The timer is tracked by Restate so it survives crashes/restarts and resumes with the remaining time. During the sleep Restate can suspend the handler (freeing resources), and on retry it replays via the journal rather than redoing completed context actions. [1][2]

Prefer delayed messages (schedule an invocation in the future) when the handler shouldn’t stay “in-flight” during the delay—Restate explicitly recommends delayed messages over “sleep + send” for long delays / object concurrency / versioning reasons. [1]

Is time.Sleep inside a workflow/object handler supported?

It’s not the recommended/supported durable-wait mechanism in Restate handlers. time.Sleep is just a local process block: it doesn’t create a durable timer entry in Restate’s journal, doesn’t let Restate suspend the handler, and interacts poorly with replay/failures. Restate’s durability/replay is built around context actions (like sleep/timers, service calls, run blocks) being recorded and replayed from the journal. [2]

Replay + cancellation implications

Replay/failures: With restate.Sleep, Restate can recover correctly (e.g., if it slept 8h of a 12h timer, it will only sleep ~4h more). With time.Sleep, a crash/retry can cause the handler to sleep again from scratch, because that wait wasn’t durably recorded. [1][2]

Cancellation: When an invocation is canceled, Restate surfaces the cancellation as a terminal error the next time the handler awaits a Restate context Promise/Future/action (including sleep/timers). So restate.Sleep / restate.After(...).Done() will return a cancellation error promptly at the await point. A plain time.Sleep will generally not be interrupted by Restate cancellation; the handler will remain blocked until the sleep ends and only then observe anything (if it checks). [3][4]

Bottom line: use restate.Sleep/restate.After (or delayed messages) for delays; avoid time.Sleep inside workflow/object handlers. [1][3]

🏁 Script executed:

# Find the deploy_handler.go file fd deploy_handler.go

Repository: unkeyed/unkey

Length of output: 97

🏁 Script executed:

# Check the file to confirm lines 186-187 and surrounding context cat -n svc/ctrl/worker/deploy/deploy_handler.go | sed -n '180,195p'

Repository: unkeyed/unkey

Length of output: 632

🏁 Script executed:

# Look for the Deploy method signature and Workflow type to confirm it's a Restate handler rg -A 5 "func.*Deploy" svc/ctrl/worker/deploy/deploy_handler.go | head -30

Repository: unkeyed/unkey

Length of output: 1322

Replace time.Sleep with restate.Sleep (or remove if debug code).

Line 186 blocks the handler with a hard 30s delay, which breaks Restate durability (crashes retry from scratch), prevents cancellation, and reduces deployment responsiveness. Replace with restate.Sleep(ctx, 30*time.Second) for a durable timer, or remove entirely if this is temporary debug code.

✂️ Minimal fix (durable)

- time.Sleep(30 * time.Second) + restate.Sleep(ctx, 30*time.Second)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@svc/ctrl/worker/deploy/deploy_handler.go` around lines 186 - 187, Replace the blocking time.Sleep call with a cancellable, durable timer by using restate.Sleep(ctx, 30*time.Second) (or remove the sleep if it was only for debugging); locate the blocking call in deploy_handler.go (e.g., inside the deploy handler function where time.Sleep(30 * time.Second) is used) and swap it to restate.Sleep(ctx, 30*time.Second) so the operation respects the passed context and Restate retry/cancellation semantics.

coderabbitai · 2026-03-16T13:45:58Z

svc/ctrl/worker/githubwebhook/handle_push.go

+		// Store invocation ID for later cancellation
+		invocationID := inv.GetInvocationId()
+		if invocationID != "" {
+			if storeErr := restate.RunVoid(ctx, func(runCtx restate.RunContext) error {
+				return db.Query.UpdateDeploymentInvocationID(runCtx, s.db.RW(), db.UpdateDeploymentInvocationIDParams{
+					ID:           deploymentID,
+					InvocationID: sql.NullString{Valid: true, String: invocationID},
+					UpdatedAt:    sql.NullInt64{Valid: true, Int64: time.Now().UnixMilli()},
+				})
+			}, restate.WithName("store invocation id")); storeErr != nil {
+				logger.Warn("failed to store invocation ID", "deployment_id", deploymentID, "error", storeErr)
+			}
+		}


⚠️ Potential issue | 🟠 Major

Don't downgrade invocation-ID persistence to a warning.

This write is the only durable link between the deployment row and the Restate workflow. If it fails, the workflow still runs but later cancellation has no kill handle and can only mutate DB state. Please make this retryable/compensated instead of returning success here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@svc/ctrl/worker/githubwebhook/handle_push.go` around lines 185 - 197, The persistence of invocationID (invocationID := inv.GetInvocationId()) must not be downgraded to a warn; if UpdateDeploymentInvocationID (db.Query.UpdateDeploymentInvocationID) fails the handler should surface/handle the error so the workflow retains a cancel handle. Replace the logger.Warn path: make the call retryable or return the error from the surrounding function (i.e., propagate the error instead of swallowing it). Use a retry/backoff loop or the existing restate.RunVoid wrapper with retry semantics (or wrap UpdateDeploymentInvocationID in a retry helper) and if retries ultimately fail return an error (or abort the request) rather than logging and continuing; keep references to invocationID, restate.RunVoid and UpdateDeploymentInvocationID to locate the change.

coderabbitai · 2026-03-16T13:45:58Z

...cts/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx

+    onSuccess: () => {
+      utils.invalidate();
+      try {
+        collection.deployments.utils.refetch();
+      } catch (error) {
+        console.error("Refetch error:", error);
+      }


⚠️ Potential issue | 🟠 Major

Scope and await post-cancel cache refresh to avoid racey/stale UI.

Line 19 invalidates the entire TRPC cache, and Line 21 refetch is not awaited before closing/toasting. This can cause unnecessary network churn and success feedback before the deployment list is actually fresh.

💡 Proposed fix

- const cancelDeployment = trpc.deploy.deployment.cancel.useMutation({ - onSuccess: () => { - utils.invalidate(); - try { - collection.deployments.utils.refetch(); - } catch (error) { - console.error("Refetch error:", error); - } - onClose(); - toast.success("Deployment cancelled"); - }, + const cancelDeployment = trpc.deploy.deployment.cancel.useMutation({ + onSuccess: async () => { + await Promise.all([ + utils.deploy.deployment.list.invalidate({ projectId: selectedDeployment.projectId }), + collection.deployments.utils.refetch(), + ]); + onClose(); + toast.success("Deployment cancelled"); + },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx around lines 18 - 24, The onSuccess handler currently calls utils.invalidate() and fires collection.deployments.utils.refetch() without awaiting it, causing racey UI; update the handler to scope invalidation/refetch to the deployments query (use collection.deployments.utils.invalidate() or collection.deployments.utils.refetch()) and await the promise(s) (await collection.deployments.utils.refetch() and await any async invalidate call) inside a try/catch before closing the dialog or showing success to ensure the deployments list is refreshed when the UI updates.

coderabbitai · 2026-03-16T13:45:59Z

web/apps/dashboard/lib/trpc/routers/deploy/deployment/cancel.ts

+      await ctrl
+        .cancelDeployment({
+          deploymentId: input.deploymentId,
+        })
+        .catch((err) => {
+          console.error(err);
+          throw new TRPCError({
+            code: "INTERNAL_SERVER_ERROR",
+            message: err,
+          });
+        });


⚠️ Potential issue | 🟠 Major

Keep ctrl cancellation failures intact.

This catch rewrites NOT_FOUND / FAILED_PRECONDITION responses from the control plane into INTERNAL_SERVER_ERROR, and message: err can surface an unusable message. Common cancel races will read like 500s in the UI. Translate the ctrl failure instead of overwriting it here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web/apps/dashboard/lib/trpc/routers/deploy/deployment/cancel.ts` around lines 75 - 85, The current catch on ctrl.cancelDeployment({ deploymentId: input.deploymentId }) swallows control-plane errors and converts everything to a generic INTERNAL_SERVER_ERROR with message: err; instead, detect and rethrow the original error when it already represents a TRPCError (or has a code-like property) so NOT_FOUND/FAILED_PRECONDITION are preserved, and when wrapping is necessary create a TRPCError using a safe string message (e.g., err.message or String(err)) rather than the raw err object; update the catch around ctrl.cancelDeployment to check for err instanceof TRPCError (or err.code) and throw err directly, otherwise throw new TRPCError({ code: "INTERNAL_SERVER_ERROR", message: err.message ?? String(err) }).

feat: cancel deployment

04548e9

I did not review this yet

vercel bot deployed to Preview March 10, 2026 07:09 View deployment

chronark marked this pull request as ready for review March 16, 2026 13:23

chronark requested review from Flo4604, MichaelUnkey, imeyer, mcstepp, ogzhanolguncu and perkinsjr as code owners March 16, 2026 13:23

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cancel deployment#5274

feat: cancel deployment#5274
chronark wants to merge 1 commit intomainfrom
cancel-deployment

chronark commented Mar 10, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

chronark commented Mar 16, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

coderabbitai bot Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chronark commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

chronark commented Mar 16, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Recommended way to delay execution (Go SDK)

Is time.Sleep inside a workflow/object handler supported?

Replay + cancellation implications

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chronark commented Mar 10, 2026 •

edited

Loading

vercel bot commented Mar 10, 2026 •

edited

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Is `time.Sleep` inside a workflow/object handler supported?