C++: Handle field initialization via NSDMI in IR generation#21391
C++: Handle field initialization via NSDMI in IR generation#21391jketema wants to merge 19 commits intogithub:mainfrom
Conversation
671d8f6 to
0a26534
Compare
bc2663d to
ce86619
Compare
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedInitialization.qll
Fixed
Show fixed
Hide fixed
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedInitialization.qll
Fixed
Show fixed
Hide fixed
823eaab to
61b75e3
Compare
Currently missing: side-effect information for the functions
These need to be looked at, but because data flow through default field initialization is currently not working, let's postpone this as part of that work.
There was a problem hiding this comment.
Pull request overview
This PR extends the C++ IR generation pipeline to represent non-static data member initializers (NSDMI) by generating synthetic IR “functions” for field initializers, and updates the AST/IR APIs and tests accordingly.
Changes:
- Introduces
ConstructorDirectFieldInitandConstructorDefaultFieldInitto distinguish explicit ctor member-initializer-list initializers from default member initializers. - Adds IR translation support for field initializers via a new translated element (
TranslatedNonStaticDataMemberVarInit) and associated side-effect plumbing. - Updates IR
FunctionAddress/Callsymbol handling fromFunctiontoDeclaration, and refreshes test baselines/expectations.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/ql/test/library-tests/ir/ir/raw_ir.expected | Updates raw IR expectations to include synthetic field-init calls/functions. |
| cpp/ql/test/library-tests/ir/ir/raw_consistency.expected | Updates raw IR consistency expectations (removes one prior issue). |
| cpp/ql/test/library-tests/ir/ir/ir.cpp | Adds StructInit test coverage for NSDMI patterns. |
| cpp/ql/test/library-tests/ir/ir/aliased_ir.expected | Updates aliased IR expectations to include synthetic field-init calls/functions. |
| cpp/ql/test/library-tests/ir/ir/PrintConfig.qll | Adjusts IR dumping config to include fields with initializers. |
| cpp/ql/test/library-tests/ir/ir/PrintAST.expected | Updates AST expectations to new ctor-init node kinds and includes StructInit. |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/type-bugs.expected | Updates expected IR type-bug outputs (now includes new entries). |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/test.cpp | Adds NSDMI-focused dataflow test cases. |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/test-source-sink.expected | Adds expected source→sink flow for the explicit ctor-init NSDMI case. |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-consistency.expected | Updates expected dataflow consistency violations (now includes NSDMI initializer call). |
| cpp/ql/test/library-tests/ctorinits/ctors.expected | Updates ctor-init classification expectations to ConstructorDirectFieldInit. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/unaliased_ssa/Instruction.qll | Broadens static call target symbol typing from Function to Declaration. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedNonStaticDataMember.qll | Adds translation root for non-static data member variable initialization (new). |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedInitialization.qll | Extends initialization translation to cover fields and default-field-init calls. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedExpr.qll | Routes expression translation through field-init synthetic roots where applicable. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedElement.qll | Adds new translated element variants for default field init and field-init roots. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedDeclarationEntry.qll | Excludes fields from statement-enclosing callable inference in declaration entries. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedCondition.qll | Extends condition translation “enclosing declaration” support to fields. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedCall.qll | Extends call side-effect handling to cover default field initializations. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/SideEffects.qll | Generalizes “call side effects” expression handling to include default field init. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll | Treats fields with initializers as IR-function-backed declarations and variables. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/Instruction.qll | Broadens function-address symbol typing from Function to Declaration. |
| cpp/ql/lib/semmle/code/cpp/ir/implementation/aliased_ssa/Instruction.qll | Broadens function-address symbol typing from Function to Declaration. |
| cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImplCommon.qll | Adds explicit casts where static call targets are now Declaration. |
| cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/DataFlowNodes.qll | Adds explicit casts where static call targets are now Declaration. |
| cpp/ql/lib/semmle/code/cpp/exprs/Call.qll | Adds/defines new ctor field-init AST node classes and documentation. |
| cpp/ql/lib/change-notes/2026-03-24-field-init.md | Documents the new ctor-init AST node classes as a feature. |
cpp/ql/lib/semmle/code/cpp/ir/implementation/aliased_ssa/Instruction.qll
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedCall.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/unaliased_ssa/Instruction.qll
Show resolved
Hide resolved
cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-consistency.expected
Show resolved
Hide resolved
cpp/ql/test/library-tests/dataflow/dataflow-tests/type-bugs.expected
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/SideEffects.qll
Outdated
Show resolved
Hide resolved
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedElement.qll
Show resolved
Hide resolved
…slatedCall.qll Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/internal/TranslatedElement.qll
Show resolved
Hide resolved
* Do not generate IR for field initializers from uninstantiated templates. * Add forgotten case to `TranslatedDeclarationEntry`
There was a problem hiding this comment.
A couple of comments. Nothing blocking. The most major one is the hasUserVariable comment, but it's mostly about possible simplifications.
There are quite a lot of new missingOperandType, instructionWithoutSuccessor, and ambiguousSuccessors inconsistencies in the first DCA run (from those numbers it looks like nlohmann/json makes heavy use of NSDMI 😅), but I'm hoping that this has been fixed by properly excluding uninstantiated templates.
| result = getEnclosingFunction(expr) or | ||
| result = getEnclosingVariable(expr).(GlobalOrNamespaceVariable) or | ||
| result = getEnclosingVariable(expr).(StaticInitializedStaticLocalVariable) | ||
| result = getEnclosingVariable(expr).(StaticInitializedStaticLocalVariable) or | ||
| result = getEnclosingVariable(expr).(Field) |
There was a problem hiding this comment.
At some point we should probably have some kind of abstract class thingie instead of having to enumerate all the possible Elements for which we generate IRFunctions. I think this enumeration happens in quite a few places at this point.
Doesn't have to be now, though, of course!
There was a problem hiding this comment.
Agreed. I like leaving that as a follow-up.
| op instanceof Opcode::EnterFunction and | ||
| tag = EnterFunctionTag() and | ||
| type = getVoidType() | ||
| or | ||
| op instanceof Opcode::AliasedDefinition and | ||
| tag = AliasedDefinitionTag() and | ||
| type = getUnknownType() | ||
| or | ||
| op instanceof Opcode::InitializeNonLocal and | ||
| tag = InitializeNonLocalTag() and | ||
| type = getUnknownType() |
There was a problem hiding this comment.
Also as a potential follow-up: Now that we have this function prologue and epilogue duplicated in three places we could probably move some of it up into the TranslatedRootElement superclass.
There was a problem hiding this comment.
Agreed. Again, I like leaving that as a follow-up.
| varUsed.(LocalScopeVariable).getEnclosingElement*() = field | ||
| or | ||
| varUsed.(Parameter).getCatchBlock().getEnclosingElement*() = field |
There was a problem hiding this comment.
I realize this is a copy-paste from TranslatedGlobalVar (which, again, is probably a copy-paste from TranslatedFunction), but I'll ask anyway:
Can these two cases ever really occur as NSDMI? This would require e.g., a use of a local variable occurring on the right-hand side of an initialisation of a field, right? i.e., C++ would have to allow something like:
int foo(int x) {
struct S {
int y = x;
};
S s;
return s.y;
}which ... I hope isn't possible?
There was a problem hiding this comment.
That's not allowed indeed: https://godbolt.org/z/zG6jM6PWG
|
|
That sounds good to me! |
Handle NSDMI in IR generation. Commit-by-commit review is recommended.
There are effectively two options here:
a. Generate a function for each default initialization that does the initialization
b. Duplicate the default initialization in each constructor that uses it
The former works better with the invariant we have in the IR that each AST node maps to at most one IR node. Hence, is the approach taken here.
For the generated functions there are again several options:
thispointer to the functionthispointer, as other fields may need to be accessed during initialization)thispointer).Option (1) is the option taken here. I did not explore (2), as it seemed a more complicated version of (1), where a little bit of code could have been shared with direct field initialization (getting the field address). Option (3) would have allowed for more sharing with direct field initialization. I explored (3) for a bit, but that ended me up in rabbit hole where I needed to modify much more IR generation, because some of the expression translations expect that they have direct access to a field address (which was no longer passed to the initialization function).
The disadvantage of (1) compared to (3) is that data flow does not work out-of-the box. However, this should be straightforward to fix.