Skip to content

feat: detect self-referencing targets across pipeline statements#61

Merged
mingjerli merged 2 commits into
mainfrom
feature/gap4-self-referencing-target
Apr 14, 2026
Merged

feat: detect self-referencing targets across pipeline statements#61
mingjerli merged 2 commits into
mainfrom
feature/gap4-self-referencing-target

Conversation

@mingjerli
Copy link
Copy Markdown
Owner

Summary

  • Self-read detection: When a multi-statement pipeline writes and reads the same table (e.g., SCD2 MERGE + INSERT on dim_customer), clgraph now creates query-scoped self-read nodes instead of collapsing both references onto a single node
  • Cycle-safe dependencies: _build_query_dependencies and _build_table_dependencies exclude self-dependencies, preventing topological sort cycles
  • Cross-query wiring: Column-granular, topo-ordered edges connect prior statement output to self-read input nodes, with edge_role and statement_order annotations on all edges
  • Patterns covered: MERGE+INSERT (SCD2), DELETE+INSERT, single-statement INSERT INTO t SELECT ... FROM t, aliased self-references

Closes Gap 4 from the CDC/SCD pipeline gap analysis (docs/superpowers/specs/2026-04-13-gap4-self-referencing-target-design.md).

Files Changed

File Change
models.py Added DELETE to SQLOperation, self_referenced_tables/self_ref_aliases to ParsedQuery, statement_order/edge_role to ColumnEdge
multi_query.py Refactored _extract_source_tables to use AST node identity for target-slot detection; added exp.Delete handling
table.py Self-exclusion guards in _build_query_dependencies and _build_table_dependencies
pipeline_lineage_builder.py Self-read node naming, _is_self_read_column helper, _add_self_read_cross_query_edges method, edge annotations
pipeline.py Added get_self_read_columns(), fixed get_column() to prefer output layer
test_cdc_scd_pipeline.py 23 test methods covering all 16 spec test cases

Test plan

  • All 23 new tests pass (tests/test_cdc_scd_pipeline.py)
  • All 1425 existing tests pass (0 regressions)
  • Pre-commit hooks pass (ruff format, ruff lint)
  • CI pipeline passes

When a multi-statement pipeline writes and reads the same table (e.g.,
SCD2 MERGE + INSERT on dim_customer), clgraph previously collapsed both
references into a single node, losing the self-read semantic. This adds
self-read node detection via AST node identity, cycle-safe dependency
resolution, query-scoped self-read naming, column-granular cross-query
wiring, and edge role/order annotations. Covers MERGE+INSERT, DELETE+INSERT,
and single-statement INSERT...FROM self patterns.

Closes Gap 4 from the CDC/SCD pipeline gap analysis.
@mingjerli mingjerli merged commit e29d86f into main Apr 14, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant