Skip to content

Add OpenTelemetry support for distributed tracing, metrics, and structured observability #561

@sonathaj

Description

@sonathaj

What would you like to be added?

feat: Add OpenTelemetry support for distributed tracing and metrics

It would be great to have OpenTelemetry support added to Synapse to provide deeper visibility into workflow execution. Below are the metrics and signals that would be most valuable:


Distributed Tracing

It would be wonderful if each workflow instance execution had an associated trace, covering:

  • End-to-end span from workflow start to completion, fault, or cancellation
  • Retry attempt spans, capturing the attempt number and the error that triggered the retry

Workflow Metrics

It would be helpful to have the following workflow-level metrics:

  • Total workflow instances started, labelled by workflow name and version
  • Total completed instances, broken down by final phase (completed, faulted, cancelled)
  • Histogram of workflow execution duration per workflow
  • Gauge of currently active instances (pending, running, suspended, waiting)

Task Metrics

It would also be useful to have task-level metrics:

  • Total task executions, labelled by task type and outcome
  • Histogram of task execution duration per task type
  • Total retry attempts, labelled by task type and error type (communication, runtime, validation, configuration)

Correlation and Event Metrics

It would be helpful to observe the correlator:

  • Total cloud events ingested
  • Gauge of currently active correlations
  • Total correlations resolved

Thanks for your consideration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions