Skip to content

feat: CDC/SCD example notebook + CI notebook wiring#66

Merged
mingjerli merged 1 commit into
mainfrom
feat/cdc-scd-notebook
Apr 16, 2026
Merged

feat: CDC/SCD example notebook + CI notebook wiring#66
mingjerli merged 1 commit into
mainfrom
feat/cdc-scd-notebook

Conversation

@mingjerli
Copy link
Copy Markdown
Owner

@mingjerli mingjerli commented Apr 15, 2026

Summary

  • Adds examples/cdc_scd_pipeline.ipynb — an end-to-end Debezium CDC → SCD Type 2 → fact → mart pipeline (6 statements, 4 layers, Databricks dialect) that stress-tests clgraph against realistic production shapes and showcases all 10 column-lineage features recently closed by the CDC/SCD gap program (gaps 1–10).
  • Wires python run_all_notebooks.py --skip-llm into CI as a new notebooks job in .github/workflows/ci.yml, so every example notebook is guaranteed to execute cleanly on every PR (was previously unenforced).
  • Updates docs/superpowers/specs/2026-04-13-cdc-scd-pipeline-gaps-design.md — progress summary, gap table, fix list, deliverables, and acceptance criteria all reflect that gaps 1, 2, 8 are now closed (alongside 3, 4, 5, 6, 7, 9, 10). All 10 gaps are resolved.
  • Polishes gap 4 & gap 7 design docs with accurate line references and edge-role naming (cross_query_self_ref). No behavior change.

What the notebook captures on the pipeline

Gap What's demonstrated Edges on this pipeline
1 Struct dot-access on CDC envelope (after.id, after.name, …) 5 struct edges, access_type=struct, nested_path
2 Dedup via subquery + WHERE rn = 1 1 qualify edge, qualify_function=ROW_NUMBER
3/10 MERGE WHEN MATCHED AND (…) condition columns 13 edges, merge_column_role=condition
4 Self-referencing target (MERGE then INSERT on dim_customer) 4 self-read columns, 4 cross-query self-ref edges, 28 prior-state-read edges
5 Literal-only outputs ('Y' AS is_active) terminal nodes, no upstream edges
6 current_timestamp() function-only source output nodes, no incoming edges
7 JOIN ON predicate columns incl. BETWEEN (point-in-time fact join) 5 predicate edges, is_join_predicate=True
8 WHERE clause columns 50 where_filter edges with where_condition metadata
9 MERGE ON literal-bound predicate (t.is_active = 'Y') 1 merge_match_filter edge

Total: 95 edges across 40 columns, 5 queries.

Test plan

  • uv run pytest tests/ — 1562 passed, 41 skipped, 2 xfailed (unchanged from baseline)
  • uv run python run_all_notebooks.py --skip-llm — 27/27 pass (including the new notebook)
  • uv run ruff check examples/cdc_scd_pipeline.ipynb — clean
  • uv run ruff format --check examples/cdc_scd_pipeline.ipynb — clean
  • CI green on new notebooks job (verify after push)

Adds examples/cdc_scd_pipeline.ipynb — a showcase notebook that
stress-tests clgraph against a realistic Debezium CDC + SCD Type 2
pipeline (6 statements across 4 layers: raw CDC, staging, dim, fact,
mart) in the Databricks dialect. Demonstrates all 10 column-lineage
features recently closed by the CDC/SCD gap program:

- Gap 1 struct dot-access (after.id, after.name, ...)
- Gap 2 dedup via subquery + WHERE rn = 1 (qualify promotion)
- Gap 3/10 MERGE WHEN MATCHED condition columns
- Gap 4 self-referencing target (close-then-open SCD2)
- Gap 5 literal-only output columns ('Y' AS is_active)
- Gap 6 function-only outputs (current_timestamp())
- Gap 7 JOIN ON predicate columns, including BETWEEN
- Gap 8 WHERE clause filter columns
- Gap 9 MERGE ON literal-bound predicate (t.is_active = 'Y')

Wires `python run_all_notebooks.py --skip-llm` into CI as a new
`notebooks` job in .github/workflows/ci.yml so notebook execution is
enforced going forward.

Also marks the CDC/SCD design doc as fully resolved (all 10 gaps
closed) and polishes the gap 4 / gap 7 design docs with accurate line
references and edge-role naming. No behavior change in those docs.
@mingjerli mingjerli merged commit 36fe0df into main Apr 16, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant