feat: MERGE condition columns in column lineage (Gaps 3, 9, 10)#64
Merged
Conversation
Adds Optional[str] field to distinguish 'value' (RHS assignment) from 'condition' (WHEN clause gating) edges. No behavior change yet. Refs #63
Column-literal EQ pairs in ON clause (e.g., t.is_active = 'Y') are now captured in match_filter_columns. Column-column pairs continue to flow through match_columns unchanged. Refs #63
Literal-bound ON predicates now produce merge_match_filter edges with merge_column_role='condition'. The target-side column (e.g., dim_customer.is_active) appears in lineage as a self-referencing condition dependency. Refs #63
…ap 3+10) Reuses extract_columns_from_expr to extract column references from WHEN MATCHED AND conditions. trace_merge_columns emits condition edges with merge_column_role='condition'. Impact analysis on condition columns (e.g., staging.name -> dim_customer.end_time) now works. Also tags value-assignment edges with merge_column_role='value' and keeps merge_match edges with merge_column_role=None. Refs #63
Applies the same condition_columns extraction to WHEN NOT MATCHED INSERT actions, so conditional inserts like 'AND s.op = c' produce condition-gating lineage edges. Refs #63
End-to-end tests verify all three gaps (3, 9, 10) working together on the canonical SCD2 MERGE pattern from the CDC pipeline design. Refs #63
Adds merge_column_role to the MERGE metadata block in JSONExporter so condition vs value edges are visible in exported data. Also fixes pipeline_lineage_builder.py to propagate merge_column_role when reconstructing ColumnEdge objects during pipeline assembly — without this, the field was always None even though trace_strategies set it correctly. Refs #63
Examples 5-8 demonstrate SCD2 condition dependencies, impact analysis, ON clause literal filters, and JSON export with merge_column_role. Refs #63
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
t.is_active = 'Y') now extracted asmatch_filter_columnsand producemerge_match_filterlineage edgesAND (t.name <> s.name)) parsed into column refs and emitted as condition-gating edges upstream of assigned target columnsmerge_column_rolefield onColumnEdgedistinguishesNone(match),"value"(SET RHS), and"condition"(WHEN guard / ON filter)Changes
models.pymerge_column_rolefield toColumnEdgequery_parser.pymatch_filter_columnscolumn_extractor.pycondition_columns; emitmerge_match_filtercol_infotrace_strategies.pytrace_merge_columnswith_resolve_to_nodehelper; emit condition-gating edgeslineage_builder.pymerge_match_filterto dispatchpipeline_lineage_builder.pymerge_column_roleduring pipeline edge reconstructionexport.pymerge_column_rolein JSON export+489 lines, -7 lines across 8 files (19 new tests)
Test plan
staging.name→dim_customer.end_timevia condition dependencymerge_match→ None,merge_update→ "value", condition → "condition")merge_column_roleCloses #63
🤖 Generated with Claude Code