Conversation
I did not review this yet
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThis pull request introduces a cancel deployment feature across the entire application stack. The changes add database support for tracking invocation IDs and a cancelled deployment status, extend the control service protobuf with a new CancelDeployment RPC, implement the corresponding backend handler, integrate Restate admin client for invocation termination, and update the frontend to display and trigger cancellation UI. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Command failed Comment Tip You can disable sequence diagrams in the walkthrough.Disable the |
|
cries in merge conflicts |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.ts (1)
56-69:⚠️ Potential issue | 🟡 Minor
readystill falls through to the default color.Line 57 checks
"completed", but this file exposes"ready"as the grouped terminal status. Ready filters will still render withbg-info-9after this change.💡 Minimal fix
- if (value === "completed") { + if (value === "ready") { return "bg-success-9"; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.ts around lines 56 - 69, getColorClass currently only checks for "completed" and leaves the grouped terminal status "ready" falling through to the default; update getColorClass to treat "ready" as a terminal/completed status (e.g., return the same "bg-success-9" as "completed") by adding a check for value === "ready" or combining it with the existing "completed" condition so ready filters render with the correct green background.web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx (1)
21-53:⚠️ Potential issue | 🟡 MinorCancelled deployments continue polling indefinitely.
Lines 28 and 38 refetch both queries every second unconditionally, but line 46 stops the local timer when
isCancelledis true. Since cancelled deployments remain on this page without automatic redirect, the client will keep hitting both endpoints until the user navigates away.Minimal fix
const steps = trpc.deploy.deployment.steps.useQuery( { deploymentId: deployment.id, }, { - refetchInterval: 1_000, + refetchInterval: isCancelled ? false : 1_000, }, ); @@ const buildSteps = trpc.deploy.deployment.buildSteps.useQuery( { deploymentId: deployment.id, includeStepLogs: true, }, { - refetchInterval: 1_000, + refetchInterval: isCancelled ? false : 1_000, }, );This pattern is already used in the codebase (e.g.,
use-fetch-timeseries.ts).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx around lines 21 - 53, The steps and buildSteps queries are refetching every second unconditionally causing continued polling for cancelled deployments; update the trpc.deploy.deployment.steps.useQuery and trpc.deploy.deployment.buildSteps.useQuery calls to disable polling when isCancelled (or equivalently add enabled: !isCancelled or set refetchInterval: undefined when isCancelled) so they stop refetching for cancelled deployments, and remove any reliance on the local now timer in the queries (leave the useEffect for now display untouched) — target the steps and buildSteps query option objects to conditionally set refetchInterval or enabled based on the isCancelled boolean.
🧹 Nitpick comments (3)
web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx (1)
35-42: Don’t swallow mutation failures inhandleCancel.
mutateAsyncalready routes errors throughonError; the extra.catchonly logs and suppresses explicit failure handling at the call site.As per coding guidelines: "Use `fault` library for error handling".💡 Proposed fix
const handleCancel = async () => { - await cancelDeployment - .mutateAsync({ - deploymentId: selectedDeployment.id, - }) - .catch((error) => { - console.error("Cancel error:", error); - }); + await cancelDeployment.mutateAsync({ + deploymentId: selectedDeployment.id, + }); };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx around lines 35 - 42, The current handleCancel swallows errors by attaching a .catch to cancelDeployment.mutateAsync; remove that .catch so errors can propagate to the mutation's onError handler, and instead route failures through the project's fault error-handling mechanism (e.g., wrap or capture the promise with the fault library as per guidelines) so failures are reported but not suppressed; update handleCancel to call cancelDeployment.mutateAsync({ deploymentId: selectedDeployment.id }) without the .catch and integrate fault (using the project's fault API) to record/handle any thrown error.web/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx (1)
40-48: Type the cancellable-status list as deployment statuses.This currently relies on a plain string array; tightening it to
Deployment["status"]avoids silent typos and keeps illegal states harder to represent.As per coding guidelines: "Make illegal states unrepresentable by modeling domain with discriminated unions and parsing inputs at boundaries into typed structures".💡 Proposed fix
- const cancellableStatuses = [ + const cancellableStatuses: ReadonlySet<Deployment["status"]> = new Set([ "pending", "starting", "building", "deploying", "network", "finalizing", - ]; - const canCancel = cancellableStatuses.includes(selectedDeployment.status); + ]); + const canCancel = cancellableStatuses.has(selectedDeployment.status);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx around lines 40 - 48, The cancellableStatuses array is untyped and can silently accept typos; change it to a typed array of the deployment status union (e.g., declare cancellableStatuses as Deployment["status"][] and import the Deployment type) so the compiler enforces valid statuses, then keep the existing canCancel check (const canCancel = cancellableStatuses.includes(selectedDeployment.status)) to work with the typed statuses; ensure the literal values exactly match members of Deployment["status"] or adjust to use a const assertion if needed.svc/ctrl/services/deployment/service.go (1)
33-45: Reject nilRestateAdmininNew.
CancelDeploymentalready treats a missing admin client as misconfiguration, butNewstill constructsServicein that state. Fail fast here so the service cannot start partially configured.As per coding guidelines, "Make illegal states unrepresentable by modeling domain with ADTs/discriminated unions and parsing inputs at boundaries into typed structures".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@svc/ctrl/services/deployment/service.go` around lines 33 - 45, New currently allows constructing a Service with a nil RestateAdmin, which defers a configuration error to runtime; update New to validate cfg.RestateAdmin and fail fast: check cfg.RestateAdmin for nil at the start of New and return a clear fatal error (panic or log.Fatalf) with context naming RestateAdmin (or change New's signature to return (*Service, error) and return an error) so the Service cannot be constructed in a partially configured state; reference the New function, Service struct and cfg.RestateAdmin when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/restate/admin/client.go`:
- Around line 105-125: The terminateInvocation function should reject empty
invocationID before constructing the URL and interpreting a 404 as success: add
a guard at the top of Client.terminateInvocation that returns a clear error
(e.g., fmt.Errorf or errors.New) when invocationID == "" so you never call c.do
or map a 404 to nil for a missing ID; keep the rest of the logic (URL formation,
response handling) unchanged and reference the same function name
terminateInvocation to locate where to add the check.
In `@svc/ctrl/services/deployment/create_deployment.go`:
- Around line 300-307: The code treats persisting the invocation ID
(db.Query.UpdateDeploymentInvocationID call using parameters ID=deploymentID and
InvocationID=sql.NullString{String: invocation.Id()}) as best-effort; instead
make persistence mandatory: after creating the workflow/invocation, if
UpdateDeploymentInvocationID returns an error, either retry the DB update a few
times with backoff or cancel/terminate the started invocation and return an
error to the caller so the deployment can be retried/rolled back. Concretely,
wrap the db.Query.UpdateDeploymentInvocationID call in a retry loop (with small
sleep/backoff) and only proceed/return success once it succeeds; if retries
exhaust, call the invocation cancel/terminate API (using invocation.Id()) and
return an error (no longer just logger.Warn) so the caller knows persistence
failed.
In `@svc/ctrl/worker/deploy/deploy_handler.go`:
- Around line 186-187: Replace the blocking time.Sleep call with a cancellable,
durable timer by using restate.Sleep(ctx, 30*time.Second) (or remove the sleep
if it was only for debugging); locate the blocking call in deploy_handler.go
(e.g., inside the deploy handler function where time.Sleep(30 * time.Second) is
used) and swap it to restate.Sleep(ctx, 30*time.Second) so the operation
respects the passed context and Restate retry/cancellation semantics.
In `@svc/ctrl/worker/githubwebhook/handle_push.go`:
- Around line 185-197: The persistence of invocationID (invocationID :=
inv.GetInvocationId()) must not be downgraded to a warn; if
UpdateDeploymentInvocationID (db.Query.UpdateDeploymentInvocationID) fails the
handler should surface/handle the error so the workflow retains a cancel handle.
Replace the logger.Warn path: make the call retryable or return the error from
the surrounding function (i.e., propagate the error instead of swallowing it).
Use a retry/backoff loop or the existing restate.RunVoid wrapper with retry
semantics (or wrap UpdateDeploymentInvocationID in a retry helper) and if
retries ultimately fail return an error (or abort the request) rather than
logging and continuing; keep references to invocationID, restate.RunVoid and
UpdateDeploymentInvocationID to locate the change.
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx:
- Around line 18-24: The onSuccess handler currently calls utils.invalidate()
and fires collection.deployments.utils.refetch() without awaiting it, causing
racey UI; update the handler to scope invalidation/refetch to the deployments
query (use collection.deployments.utils.invalidate() or
collection.deployments.utils.refetch()) and await the promise(s) (await
collection.deployments.utils.refetch() and await any async invalidate call)
inside a try/catch before closing the dialog or showing success to ensure the
deployments list is refreshed when the UI updates.
In `@web/apps/dashboard/lib/trpc/routers/deploy/deployment/cancel.ts`:
- Around line 75-85: The current catch on ctrl.cancelDeployment({ deploymentId:
input.deploymentId }) swallows control-plane errors and converts everything to a
generic INTERNAL_SERVER_ERROR with message: err; instead, detect and rethrow the
original error when it already represents a TRPCError (or has a code-like
property) so NOT_FOUND/FAILED_PRECONDITION are preserved, and when wrapping is
necessary create a TRPCError using a safe string message (e.g., err.message or
String(err)) rather than the raw err object; update the catch around
ctrl.cancelDeployment to check for err instanceof TRPCError (or err.code) and
throw err directly, otherwise throw new TRPCError({ code:
"INTERNAL_SERVER_ERROR", message: err.message ?? String(err) }).
---
Outside diff comments:
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsx:
- Around line 21-53: The steps and buildSteps queries are refetching every
second unconditionally causing continued polling for cancelled deployments;
update the trpc.deploy.deployment.steps.useQuery and
trpc.deploy.deployment.buildSteps.useQuery calls to disable polling when
isCancelled (or equivalently add enabled: !isCancelled or set refetchInterval:
undefined when isCancelled) so they stop refetching for cancelled deployments,
and remove any reliance on the local now timer in the queries (leave the
useEffect for now display untouched) — target the steps and buildSteps query
option objects to conditionally set refetchInterval or enabled based on the
isCancelled boolean.
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.ts:
- Around line 56-69: getColorClass currently only checks for "completed" and
leaves the grouped terminal status "ready" falling through to the default;
update getColorClass to treat "ready" as a terminal/completed status (e.g.,
return the same "bg-success-9" as "completed") by adding a check for value ===
"ready" or combining it with the existing "completed" condition so ready filters
render with the correct green background.
---
Nitpick comments:
In `@svc/ctrl/services/deployment/service.go`:
- Around line 33-45: New currently allows constructing a Service with a nil
RestateAdmin, which defers a configuration error to runtime; update New to
validate cfg.RestateAdmin and fail fast: check cfg.RestateAdmin for nil at the
start of New and return a clear fatal error (panic or log.Fatalf) with context
naming RestateAdmin (or change New's signature to return (*Service, error) and
return an error) so the Service cannot be constructed in a partially configured
state; reference the New function, Service struct and cfg.RestateAdmin when
making the change.
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx:
- Around line 35-42: The current handleCancel swallows errors by attaching a
.catch to cancelDeployment.mutateAsync; remove that .catch so errors can
propagate to the mutation's onError handler, and instead route failures through
the project's fault error-handling mechanism (e.g., wrap or capture the promise
with the fault library as per guidelines) so failures are reported but not
suppressed; update handleCancel to call cancelDeployment.mutateAsync({
deploymentId: selectedDeployment.id }) without the .catch and integrate fault
(using the project's fault API) to record/handle any thrown error.
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsx:
- Around line 40-48: The cancellableStatuses array is untyped and can silently
accept typos; change it to a typed array of the deployment status union (e.g.,
declare cancellableStatuses as Deployment["status"][] and import the Deployment
type) so the compiler enforces valid statuses, then keep the existing canCancel
check (const canCancel =
cancellableStatuses.includes(selectedDeployment.status)) to work with the typed
statuses; ensure the literal values exactly match members of
Deployment["status"] or adjust to use a const assertion if needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a6fee186-d49e-47a4-ad8d-31a20d780ffd
⛔ Files ignored due to path filters (4)
gen/proto/ctrl/v1/ctrlv1connect/deployment.connect.gois excluded by!**/gen/**gen/proto/ctrl/v1/deployment.pb.gois excluded by!**/*.pb.go,!**/gen/**gen/rpc/ctrl/deployment_generated.gois excluded by!**/gen/**web/apps/dashboard/gen/proto/ctrl/v1/deployment_pb.tsis excluded by!**/gen/**
📒 Files selected for processing (41)
pkg/db/BUILD.bazelpkg/db/deployment_find_by_id.sql_generated.gopkg/db/deployment_find_by_k8s_name.sql_generated.gopkg/db/deployment_list_by_environment_id_and_status.sql_generated.gopkg/db/deployment_topology_list_by_versions.sql_generated.gopkg/db/deployment_topology_list_desired.sql_generated.gopkg/db/deployment_update_invocation_id.sql_generated.gopkg/db/models_generated.gopkg/db/querier_generated.gopkg/db/queries/deployment_update_invocation_id.sqlpkg/db/schema.sqlpkg/restate/admin/client.gosvc/api/internal/testutil/mock_deployment_client.gosvc/api/openapi/gen.gosvc/api/openapi/openapi-generated.yamlsvc/api/openapi/spec/paths/v2/deploy/getDeployment/V2DeployGetDeploymentResponseData.yamlsvc/api/routes/v2_deploy_get_deployment/handler.gosvc/ctrl/api/run.gosvc/ctrl/proto/ctrl/v1/deployment.protosvc/ctrl/services/deployment/BUILD.bazelsvc/ctrl/services/deployment/cancel_deployment.gosvc/ctrl/services/deployment/create_deployment.gosvc/ctrl/services/deployment/get_deployment.gosvc/ctrl/services/deployment/service.gosvc/ctrl/worker/deploy/deploy_handler.gosvc/ctrl/worker/githubwebhook/handle_push.gosvc/frontline/services/router/BUILD.bazelsvc/frontline/services/router/service.goweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/[deploymentId]/(deployment-progress)/deployment-progress.tsxweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsxweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/deployment-list-table-action.popover.constants.tsxweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/deployment-status-badge.tsxweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/domain_list.tsxweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/deployments-list.tsxweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/utils/get-row-class.tsweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/filters.schema.tsweb/apps/dashboard/app/(app)/[workspaceSlug]/projects/[projectId]/components/deployment-status-badge.tsxweb/apps/dashboard/lib/collections/deploy/deployments.tsweb/apps/dashboard/lib/trpc/routers/deploy/deployment/cancel.tsweb/apps/dashboard/lib/trpc/routers/index.tsweb/internal/db/src/schema/deployments.ts
| // Cancel the Restate invocation if we have an invocation ID | ||
| if dep.InvocationID.Valid && dep.InvocationID.String != "" { | ||
| if s.restateAdmin == nil { | ||
| return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("restate admin client is not configured")) | ||
| } | ||
|
|
||
| if killErr := s.restateAdmin.KillInvocation(ctx, dep.InvocationID.String); killErr != nil { | ||
| logger.Warn("failed to kill deployment invocation", | ||
| "deployment_id", deploymentID, | ||
| "invocation_id", dep.InvocationID.String, | ||
| "error", killErr, | ||
| ) | ||
| // Continue with status update even if kill fails | ||
| } else { | ||
| logger.Info("killed deployment invocation", | ||
| "deployment_id", deploymentID, | ||
| "invocation_id", dep.InvocationID.String, | ||
| ) | ||
| } | ||
| } else { | ||
| logger.Warn("no invocation ID stored for deployment, updating status only", | ||
| "deployment_id", deploymentID, | ||
| ) | ||
| } | ||
|
|
||
| // Mark the deployment as cancelled | ||
| err = db.Query.UpdateDeploymentStatus(ctx, s.db.RW(), db.UpdateDeploymentStatusParams{ | ||
| ID: deploymentID, | ||
| Status: db.DeploymentsStatusCancelled, | ||
| UpdatedAt: sql.NullInt64{Valid: true, Int64: time.Now().UnixMilli()}, | ||
| }) | ||
| if err != nil { | ||
| return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("failed to update deployment status: %w", err)) |
There was a problem hiding this comment.
Only write cancelled after a real, current cancel.
Both the killErr / missing-invocation path and the stale-read race still fall through to UpdateDeploymentStatus. That can mark a deployment cancelled even though its workflow kept running or already reached a terminal state. Make the status write conditional on successful termination and the deployment still being in a cancellable status.
| // Store invocation ID for later cancellation | ||
| if updateErr := db.Query.UpdateDeploymentInvocationID(ctx, s.db.RW(), db.UpdateDeploymentInvocationIDParams{ | ||
| ID: deploymentID, | ||
| InvocationID: sql.NullString{Valid: true, String: invocation.Id()}, | ||
| UpdatedAt: sql.NullInt64{Valid: true, Int64: time.Now().UnixMilli()}, | ||
| }); updateErr != nil { | ||
| logger.Warn("failed to store invocation ID", "deployment_id", deploymentID, "error", updateErr) | ||
| } |
There was a problem hiding this comment.
Don't treat invocation-id persistence as best-effort.
If Lines 301-307 fail, the workflow is already running but this deployment can no longer be cancelled/killed later—the core capability added in this PR. This needs a retry or compensating path, not just a warning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@svc/ctrl/services/deployment/create_deployment.go` around lines 300 - 307,
The code treats persisting the invocation ID
(db.Query.UpdateDeploymentInvocationID call using parameters ID=deploymentID and
InvocationID=sql.NullString{String: invocation.Id()}) as best-effort; instead
make persistence mandatory: after creating the workflow/invocation, if
UpdateDeploymentInvocationID returns an error, either retry the DB update a few
times with backoff or cancel/terminate the started invocation and return an
error to the caller so the deployment can be retried/rolled back. Concretely,
wrap the db.Query.UpdateDeploymentInvocationID call in a retry loop (with small
sleep/backoff) and only proceed/return success once it succeeds; if retries
exhaust, call the invocation cancel/terminate API (using invocation.Id()) and
return an error (no longer just logger.Warn) so the caller knows persistence
failed.
| time.Sleep(30 * time.Second) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Restate Go SDK durable workflows, what is the recommended way to delay execution? Is using time.Sleep inside a workflow/object handler supported, and what are the replay/cancellation implications?
💡 Result:
Recommended way to delay execution (Go SDK)
Use Restate’s durable timers, not Go’s time.Sleep:
restate.Sleep(ctx, d)to pause a handler durably. The timer is tracked by Restate so it survives crashes/restarts and resumes with the remaining time. During the sleep Restate can suspend the handler (freeing resources), and on retry it replays via the journal rather than redoing completed context actions. [1][2]- Prefer delayed messages (schedule an invocation in the future) when the handler shouldn’t stay “in-flight” during the delay—Restate explicitly recommends delayed messages over “sleep + send” for long delays / object concurrency / versioning reasons. [1]
Is time.Sleep inside a workflow/object handler supported?
It’s not the recommended/supported durable-wait mechanism in Restate handlers. time.Sleep is just a local process block: it doesn’t create a durable timer entry in Restate’s journal, doesn’t let Restate suspend the handler, and interacts poorly with replay/failures. Restate’s durability/replay is built around context actions (like sleep/timers, service calls, run blocks) being recorded and replayed from the journal. [2]
Replay + cancellation implications
- Replay/failures: With
restate.Sleep, Restate can recover correctly (e.g., if it slept 8h of a 12h timer, it will only sleep ~4h more). Withtime.Sleep, a crash/retry can cause the handler to sleep again from scratch, because that wait wasn’t durably recorded. [1][2] - Cancellation: When an invocation is canceled, Restate surfaces the cancellation as a terminal error the next time the handler awaits a Restate context Promise/Future/action (including sleep/timers). So
restate.Sleep/restate.After(...).Done()will return a cancellation error promptly at the await point. A plaintime.Sleepwill generally not be interrupted by Restate cancellation; the handler will remain blocked until the sleep ends and only then observe anything (if it checks). [3][4]
Bottom line: use restate.Sleep/restate.After (or delayed messages) for delays; avoid time.Sleep inside workflow/object handlers. [1][3]
🏁 Script executed:
# Find the deploy_handler.go file
fd deploy_handler.goRepository: unkeyed/unkey
Length of output: 97
🏁 Script executed:
# Check the file to confirm lines 186-187 and surrounding context
cat -n svc/ctrl/worker/deploy/deploy_handler.go | sed -n '180,195p'Repository: unkeyed/unkey
Length of output: 632
🏁 Script executed:
# Look for the Deploy method signature and Workflow type to confirm it's a Restate handler
rg -A 5 "func.*Deploy" svc/ctrl/worker/deploy/deploy_handler.go | head -30Repository: unkeyed/unkey
Length of output: 1322
Replace time.Sleep with restate.Sleep (or remove if debug code).
Line 186 blocks the handler with a hard 30s delay, which breaks Restate durability (crashes retry from scratch), prevents cancellation, and reduces deployment responsiveness. Replace with restate.Sleep(ctx, 30*time.Second) for a durable timer, or remove entirely if this is temporary debug code.
✂️ Minimal fix (durable)
- time.Sleep(30 * time.Second)
+ restate.Sleep(ctx, 30*time.Second)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@svc/ctrl/worker/deploy/deploy_handler.go` around lines 186 - 187, Replace the
blocking time.Sleep call with a cancellable, durable timer by using
restate.Sleep(ctx, 30*time.Second) (or remove the sleep if it was only for
debugging); locate the blocking call in deploy_handler.go (e.g., inside the
deploy handler function where time.Sleep(30 * time.Second) is used) and swap it
to restate.Sleep(ctx, 30*time.Second) so the operation respects the passed
context and Restate retry/cancellation semantics.
| // Store invocation ID for later cancellation | ||
| invocationID := inv.GetInvocationId() | ||
| if invocationID != "" { | ||
| if storeErr := restate.RunVoid(ctx, func(runCtx restate.RunContext) error { | ||
| return db.Query.UpdateDeploymentInvocationID(runCtx, s.db.RW(), db.UpdateDeploymentInvocationIDParams{ | ||
| ID: deploymentID, | ||
| InvocationID: sql.NullString{Valid: true, String: invocationID}, | ||
| UpdatedAt: sql.NullInt64{Valid: true, Int64: time.Now().UnixMilli()}, | ||
| }) | ||
| }, restate.WithName("store invocation id")); storeErr != nil { | ||
| logger.Warn("failed to store invocation ID", "deployment_id", deploymentID, "error", storeErr) | ||
| } | ||
| } |
There was a problem hiding this comment.
Don't downgrade invocation-ID persistence to a warning.
This write is the only durable link between the deployment row and the Restate workflow. If it fails, the workflow still runs but later cancellation has no kill handle and can only mutate DB state. Please make this retryable/compensated instead of returning success here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@svc/ctrl/worker/githubwebhook/handle_push.go` around lines 185 - 197, The
persistence of invocationID (invocationID := inv.GetInvocationId()) must not be
downgraded to a warn; if UpdateDeploymentInvocationID
(db.Query.UpdateDeploymentInvocationID) fails the handler should surface/handle
the error so the workflow retains a cancel handle. Replace the logger.Warn path:
make the call retryable or return the error from the surrounding function (i.e.,
propagate the error instead of swallowing it). Use a retry/backoff loop or the
existing restate.RunVoid wrapper with retry semantics (or wrap
UpdateDeploymentInvocationID in a retry helper) and if retries ultimately fail
return an error (or abort the request) rather than logging and continuing; keep
references to invocationID, restate.RunVoid and UpdateDeploymentInvocationID to
locate the change.
| onSuccess: () => { | ||
| utils.invalidate(); | ||
| try { | ||
| collection.deployments.utils.refetch(); | ||
| } catch (error) { | ||
| console.error("Refetch error:", error); | ||
| } |
There was a problem hiding this comment.
Scope and await post-cancel cache refresh to avoid racey/stale UI.
Line 19 invalidates the entire TRPC cache, and Line 21 refetch is not awaited before closing/toasting. This can cause unnecessary network churn and success feedback before the deployment list is actually fresh.
💡 Proposed fix
- const cancelDeployment = trpc.deploy.deployment.cancel.useMutation({
- onSuccess: () => {
- utils.invalidate();
- try {
- collection.deployments.utils.refetch();
- } catch (error) {
- console.error("Refetch error:", error);
- }
- onClose();
- toast.success("Deployment cancelled");
- },
+ const cancelDeployment = trpc.deploy.deployment.cancel.useMutation({
+ onSuccess: async () => {
+ await Promise.all([
+ utils.deploy.deployment.list.invalidate({ projectId: selectedDeployment.projectId }),
+ collection.deployments.utils.refetch(),
+ ]);
+ onClose();
+ toast.success("Deployment cancelled");
+ },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@web/apps/dashboard/app/`(app)/[workspaceSlug]/projects/[projectId]/(overview)/deployments/components/table/components/actions/cancel-dialog.tsx
around lines 18 - 24, The onSuccess handler currently calls utils.invalidate()
and fires collection.deployments.utils.refetch() without awaiting it, causing
racey UI; update the handler to scope invalidation/refetch to the deployments
query (use collection.deployments.utils.invalidate() or
collection.deployments.utils.refetch()) and await the promise(s) (await
collection.deployments.utils.refetch() and await any async invalidate call)
inside a try/catch before closing the dialog or showing success to ensure the
deployments list is refreshed when the UI updates.
| await ctrl | ||
| .cancelDeployment({ | ||
| deploymentId: input.deploymentId, | ||
| }) | ||
| .catch((err) => { | ||
| console.error(err); | ||
| throw new TRPCError({ | ||
| code: "INTERNAL_SERVER_ERROR", | ||
| message: err, | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Keep ctrl cancellation failures intact.
This catch rewrites NOT_FOUND / FAILED_PRECONDITION responses from the control plane into INTERNAL_SERVER_ERROR, and message: err can surface an unusable message. Common cancel races will read like 500s in the UI. Translate the ctrl failure instead of overwriting it here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/apps/dashboard/lib/trpc/routers/deploy/deployment/cancel.ts` around lines
75 - 85, The current catch on ctrl.cancelDeployment({ deploymentId:
input.deploymentId }) swallows control-plane errors and converts everything to a
generic INTERNAL_SERVER_ERROR with message: err; instead, detect and rethrow the
original error when it already represents a TRPCError (or has a code-like
property) so NOT_FOUND/FAILED_PRECONDITION are preserved, and when wrapping is
necessary create a TRPCError using a safe string message (e.g., err.message or
String(err)) rather than the raw err object; update the catch around
ctrl.cancelDeployment to check for err instanceof TRPCError (or err.code) and
throw err directly, otherwise throw new TRPCError({ code:
"INTERNAL_SERVER_ERROR", message: err.message ?? String(err) }).
stores the restate invocation id of a deployment in order to be able to cancel/kill it later