Skip to content

C++: Handle field initialization via NSDMI in IR generation#21391

Open
jketema wants to merge 19 commits intogithub:mainfrom
jketema:jketema/nsdmi
Open

C++: Handle field initialization via NSDMI in IR generation#21391
jketema wants to merge 19 commits intogithub:mainfrom
jketema:jketema/nsdmi

Conversation

@jketema
Copy link
Contributor

@jketema jketema commented Feb 27, 2026

Handle NSDMI in IR generation. Commit-by-commit review is recommended.

There are effectively two options here:

a. Generate a function for each default initialization that does the initialization
b. Duplicate the default initialization in each constructor that uses it

The former works better with the invariant we have in the IR that each AST node maps to at most one IR node. Hence, is the approach taken here.

For the generated functions there are again several options:

  1. Initialize the field in the generated function by passing the this pointer to the function
  2. Pass the field address to the function and initialize through that address (we will still need to pass the this pointer, as other fields may need to be accessed during initialization)
  3. Have the function return the value to be used for the initialization and update the field in any constructor that need the default initialization (again we will need to pass the this pointer).

Option (1) is the option taken here. I did not explore (2), as it seemed a more complicated version of (1), where a little bit of code could have been shared with direct field initialization (getting the field address). Option (3) would have allowed for more sharing with direct field initialization. I explored (3) for a bit, but that ended me up in rabbit hole where I needed to modify much more IR generation, because some of the expression translations expect that they have direct access to a field address (which was no longer passed to the initialization function).

The disadvantage of (1) compared to (3) is that data flow does not work out-of-the box. However, this should be straightforward to fix.

@github-actions github-actions bot added the C++ label Feb 27, 2026
@jketema jketema force-pushed the jketema/nsdmi branch 2 times, most recently from 671d8f6 to 0a26534 Compare March 3, 2026 15:46
@jketema jketema force-pushed the jketema/nsdmi branch 2 times, most recently from bc2663d to ce86619 Compare March 24, 2026 09:57
@jketema jketema force-pushed the jketema/nsdmi branch 2 times, most recently from 823eaab to 61b75e3 Compare March 24, 2026 14:21
@jketema jketema marked this pull request as ready for review March 24, 2026 15:00
@jketema jketema requested a review from a team as a code owner March 24, 2026 15:00
Copilot AI review requested due to automatic review settings March 24, 2026 15:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the C++ IR generation pipeline to represent non-static data member initializers (NSDMI) by generating synthetic IR “functions” for field initializers, and updates the AST/IR APIs and tests accordingly.

Changes:

  • Introduces ConstructorDirectFieldInit and ConstructorDefaultFieldInit to distinguish explicit ctor member-initializer-list initializers from default member initializers.
  • Adds IR translation support for field initializers via a new translated element (TranslatedNonStaticDataMemberVarInit) and associated side-effect plumbing.
  • Updates IR FunctionAddress/Call symbol handling from Function to Declaration, and refreshes test baselines/expectations.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
cpp/ql/test/library-tests/ir/ir/raw_ir.expected Updates raw IR expectations to include synthetic field-init calls/functions.
cpp/ql/test/library-tests/ir/ir/raw_consistency.expected Updates raw IR consistency expectations (removes one prior issue).
cpp/ql/test/library-tests/ir/ir/ir.cpp Adds StructInit test coverage for NSDMI patterns.
cpp/ql/test/library-tests/ir/ir/aliased_ir.expected Updates aliased IR expectations to include synthetic field-init calls/functions.
cpp/ql/test/library-tests/ir/ir/PrintConfig.qll Adjusts IR dumping config to include fields with initializers.
cpp/ql/test/library-tests/ir/ir/PrintAST.expected Updates AST expectations to new ctor-init node kinds and includes StructInit.
cpp/ql/test/library-tests/dataflow/dataflow-tests/type-bugs.expected Updates expected IR type-bug outputs (now includes new entries).
cpp/ql/test/library-tests/dataflow/dataflow-tests/test.cpp Adds NSDMI-focused dataflow test cases.
cpp/ql/test/library-tests/dataflow/dataflow-tests/test-source-sink.expected Adds expected source→sink flow for the explicit ctor-init NSDMI case.
cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-consistency.expected Updates expected dataflow consistency violations (now includes NSDMI initializer call).
cpp/ql/test/library-tests/ctorinits/ctors.expected Updates ctor-init classification expectations to ConstructorDirectFieldInit.
cpp/ql/lib/semmle/code/cpp/ir/implementation/unaliased_ssa/Instruction.qll Broadens static call target symbol typing from Function to Declaration.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedNonStaticDataMember.qll Adds translation root for non-static data member variable initialization (new).
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedInitialization.qll Extends initialization translation to cover fields and default-field-init calls.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedExpr.qll Routes expression translation through field-init synthetic roots where applicable.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedElement.qll Adds new translated element variants for default field init and field-init roots.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedDeclarationEntry.qll Excludes fields from statement-enclosing callable inference in declaration entries.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedCondition.qll Extends condition translation “enclosing declaration” support to fields.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedCall.qll Extends call side-effect handling to cover default field initializations.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/SideEffects.qll Generalizes “call side effects” expression handling to include default field init.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll Treats fields with initializers as IR-function-backed declarations and variables.
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/Instruction.qll Broadens function-address symbol typing from Function to Declaration.
cpp/ql/lib/semmle/code/cpp/ir/implementation/aliased_ssa/Instruction.qll Broadens function-address symbol typing from Function to Declaration.
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImplCommon.qll Adds explicit casts where static call targets are now Declaration.
cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowNodes.qll Adds explicit casts where static call targets are now Declaration.
cpp/ql/lib/semmle/code/cpp/exprs/Call.qll Adds/defines new ctor field-init AST node classes and documentation.
cpp/ql/lib/change-notes/2026-03-24-field-init.md Documents the new ctor-init AST node classes as a feature.

jketema and others added 2 commits March 24, 2026 16:09
…slatedCall.qll

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github github deleted a comment from Arsamuding1998Arwae Mar 25, 2026
jketema added 3 commits March 25, 2026 16:31
* Do not generate IR for field initializers from uninstantiated templates.
* Add forgotten case to `TranslatedDeclarationEntry`
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments. Nothing blocking. The most major one is the hasUserVariable comment, but it's mostly about possible simplifications.

There are quite a lot of new missingOperandType, instructionWithoutSuccessor, and ambiguousSuccessors inconsistencies in the first DCA run (from those numbers it looks like nlohmann/json makes heavy use of NSDMI 😅), but I'm hoping that this has been fixed by properly excluding uninstantiated templates.

Comment on lines 37 to +40
result = getEnclosingFunction(expr) or
result = getEnclosingVariable(expr).(GlobalOrNamespaceVariable) or
result = getEnclosingVariable(expr).(StaticInitializedStaticLocalVariable)
result = getEnclosingVariable(expr).(StaticInitializedStaticLocalVariable) or
result = getEnclosingVariable(expr).(Field)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we should probably have some kind of abstract class thingie instead of having to enumerate all the possible Elements for which we generate IRFunctions. I think this enumeration happens in quite a few places at this point.

Doesn't have to be now, though, of course!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I like leaving that as a follow-up.

Comment on lines +43 to +53
op instanceof Opcode::EnterFunction and
tag = EnterFunctionTag() and
type = getVoidType()
or
op instanceof Opcode::AliasedDefinition and
tag = AliasedDefinitionTag() and
type = getUnknownType()
or
op instanceof Opcode::InitializeNonLocal and
tag = InitializeNonLocalTag() and
type = getUnknownType()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also as a potential follow-up: Now that we have this function prologue and epilogue duplicated in three places we could probably move some of it up into the TranslatedRootElement superclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Again, I like leaving that as a follow-up.

Comment on lines +180 to +182
varUsed.(LocalScopeVariable).getEnclosingElement*() = field
or
varUsed.(Parameter).getCatchBlock().getEnclosingElement*() = field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is a copy-paste from TranslatedGlobalVar (which, again, is probably a copy-paste from TranslatedFunction), but I'll ask anyway:

Can these two cases ever really occur as NSDMI? This would require e.g., a use of a local variable occurring on the right-hand side of an initialisation of a field, right? i.e., C++ would have to allow something like:

int foo(int x) {
  struct S {
    int y = x;
  };
  S s;
  return s.y;
}

which ... I hope isn't possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not allowed indeed: https://godbolt.org/z/zG6jM6PWG

@jketema
Copy link
Contributor Author

jketema commented Mar 25, 2026

There are quite a lot of new missingOperandType, instructionWithoutSuccessor, and ambiguousSuccessors inconsistencies in the first DCA run (from those numbers it looks like nlohmann/json makes heavy use of NSDMI 😅), but I'm hoping that this has been fixed by properly excluding uninstantiated templates.

instructionWithoutSuccessor has improved significantly. The other two persist. Might also just be because we simply end up with more IR that we were throwing away before (the inconsistency numbers for the aliased IR). Not sure whether it makes sense to investigate this further in the context of this PR. Since the numbers are quite small compared with what is likely the total size of the IR, I'm tempted to leave this. We should probably spend a few days somewhere to investigate IR inconsistencies (in general, not just the new ones we see here).

@MathiasVP
Copy link
Contributor

That sounds good to me! instructionWithoutSuccessor is also the one I was most worried about. The other ones I think it's fine to delay. In that case, I think this all LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants