Python: add new shared-SSA-backed SSA adapter (additive)#21923
Open
yoff wants to merge 1 commit into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds an additive, shared-SSA-backed SSA adapter for Python on top of the new shared CFG facade, along with library test packs to validate the new SSA in isolation and against legacy ESSA as preparation for the upcoming dataflow migration.
Changes:
- Introduce
semmle.python.dataflow.new.internal.SsaImpl(shared-SSA instantiation + ESSA-shaped adapter surface). - Add a new inline-expectations SSA test pack (
dataflow-new-ssa) and a “vs legacy ESSA” comparison pack (dataflow-new-ssa-vs-legacy). - Add a Python library change note documenting the new adapter.
Show a summary per file
| File | Description |
|---|---|
| python/ql/test/library-tests/dataflow-new-ssa/test.py | New Python corpus for inline SSA def/use/phi expectations. |
| python/ql/test/library-tests/dataflow-new-ssa/SsaTest.ql | Inline-expectations driver query for the new SSA adapter. |
| python/ql/test/library-tests/dataflow-new-ssa/SsaTest.expected | Snapshot of current inline-expectations outcomes. |
| python/ql/test/library-tests/dataflow-new-ssa-vs-legacy/test.py | Shared corpus used to compare new SSA vs legacy ESSA. |
| python/ql/test/library-tests/dataflow-new-ssa-vs-legacy/CmpTest.ql | Query comparing def signatures between new SSA and legacy ESSA. |
| python/ql/test/library-tests/dataflow-new-ssa-vs-legacy/CmpTest.expected | Snapshot of current “def-only-old/new” diffs. |
| python/ql/lib/semmle/python/dataflow/new/internal/SsaImpl.qll | New shared-SSA-backed SSA adapter and ESSA-shaped compatibility layer. |
| python/ql/lib/change-notes/2026-05-19-add-shared-ssa.md | Change note entry for the new SSA adapter. |
Copilot's findings
- Files reviewed: 8/8 changed files
- Comments generated: 3
Comment on lines
+3
to
+7
| # The shared SSA implementation prunes its construction by liveness: | ||
| # definitions of variables that are not read are never materialised. | ||
| # This is by design — write-only variables would only bloat the SSA | ||
| # graph. Tests therefore must always include a read of each variable | ||
| # being verified. |
Comment on lines
+15
to
+16
| * - Captured / closure variables (gap: new SSA does not yet model | ||
| * closure captures). |
Comment on lines
+1
to
+3
| --- | ||
| category: minorAnalysis | ||
| --- |
This was referenced Jun 1, 2026
yoff
pushed a commit
that referenced
this pull request
Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yoff
pushed a commit
that referenced
this pull request
Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yoff
pushed a commit
that referenced
this pull request
Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
b547f1b to
370dc98
Compare
28062e1 to
4d4b916
Compare
yoff
pushed a commit
that referenced
this pull request
Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
370dc98 to
0ca4cab
Compare
Preparatory refactor for the shared-CFG dataflow migration. Adds the new Python SSA adapter additively, without changing any production behaviour. Library additions: - semmle.python.dataflow.new.internal.SsaImpl — Python SSA implementation built on the new (shared) CFG. Mirrors the Java SSA adapter (java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll): an InputSig is defined in terms of positional (BasicBlock, int) variable references, and the shared codeql.ssa.Ssa::Make<Location, Cfg, Input> module is then instantiated. SourceVariable is the AST-level Py::Variable. Variable references are looked up via the new CFG facade's NameNode.defines/uses/deletes predicates (added in the preceding PR), which themselves are one-line bridges to AST-level Name.defines/uses/deletes. Implicit-entry definitions are inserted for non-local/global/builtin reads, captured variables, and (when needed) parameters. Test additions: - library-tests/dataflow-new-ssa/ — exercises the new SSA over a representative test corpus and checks expected def/use chains. - library-tests/dataflow-new-ssa-vs-legacy/ — runs both new SSA and legacy ESSA over the same corpus and diffs the results, so any semantic divergence shows up as a test failure. Production impact: None. The new SSA adapter has zero callers in lib/ and src/ — the legacy ESSA SSA (semmle/python/essa/*) remains the default. The dataflow library is not migrated yet; that lands in a follow-up PR. Verified by: - All 367 lib + src + consistency-queries compile clean. - All 641 ControlFlow + PointsTo + dataflow + essa + consistency library-tests pass. - Both new dataflow-new-ssa[/vs-legacy] test packs pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4d4b916 to
6313e70
Compare
yoff
pushed a commit
that referenced
this pull request
Jun 1, 2026
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Preparatory refactor for the shared-CFG dataflow migration (#21894).
Based on #21921 — merge that first.
Adds the new Python SSA adapter additively, without changing any production behaviour.
Library addition
semmle.python.dataflow.new.internal.SsaImpl— Python SSA implementation built on the new (shared) CFG.Mirrors the Java SSA adapter at
java/ql/lib/semmle/code/java/dataflow/internal/SsaImpl.qll: anInputSigis defined in terms of positional(BasicBlock, int)variable references, and the sharedcodeql.ssa.Ssa::Make<Location, Cfg, Input>module is then instantiated.SourceVariableis the AST-levelPy::Variable. Variable references are looked up via the new CFG facade'sNameNode.defines/uses/deletespredicates (added in the preceding PR), which themselves are one-line bridges to AST-levelName.defines/uses/deletes.Implicit-entry definitions are inserted for non-local/global/builtin reads, captured variables, and (when needed) parameters.
Test additions
library-tests/dataflow-new-ssa/— exercises the new SSA over a representative test corpus and checks expected def/use chains.library-tests/dataflow-new-ssa-vs-legacy/— runs both new SSA and legacy ESSA over the same corpus and diffs the results, so any semantic divergence shows up as a test failure.Production impact
None. The new SSA adapter has zero callers in
lib/andsrc/— the legacy ESSA SSA (semmle/python/essa/*) remains the default. The dataflow library is not migrated yet; that lands in #21894 (which will shrink considerably once this stack merges).Verification
lib/+src/+consistency-queries/queries compile clean.dataflow-new-ssa[/vs-legacy]test packs pass.Notes for reviewers
SsaImpl.qllis the only non-test addition; ~290 LOC. Mostly theInputSigglue + the implicit-entry-definition logic.CmpTest.expected.