Add inline asm support for amdgpu#149793
Conversation
This comment has been minimized.
This comment has been minimized.
|
Some changes occurred in compiler/rustc_codegen_gcc |
|
This seems okay to me, but I'd rather someone more familiar with this part of the compiler give the final signoff. @bors r? |
|
Error: Parsing assign command in comment failed: ...'' | error: specify user to assign to at >| ''... Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #triagebot on Zulip. |
|
@bors r? compiler |
|
@rustbot reroll |
bdb726b to
9db5dca
Compare
|
Removed return type from tests to fix conflict with #149991, which starts disallowing returns in |
|
The change seems Ok, i'd like people with more background to take a look. |
|
That's not me (sorry it took me a while because of holidays). But iirc that could be amanieu? r? @Amanieu |
|
☔ The latest upstream changes (presumably #150866) made this pull request unmergeable. Please resolve the merge conflicts. |
This comment has been minimized.
This comment has been minimized.
|
Rebased to fix conflict, no other changes Edit: kdiff3 didn’t like 🦀 when fixing conflicts 😢, fixed now |
This comment has been minimized.
This comment has been minimized.
|
Hey @Amanieu, are you able to take a look at this PR? Your domain expertise would be really valuable in the review process 🙂 |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the review! |
|
Register classes are now split by size and vector types exist now, including tests. I’m not completely convinced that splitting the register classes by size is the best thing to do.
The overlapping check is defined by register (not by class) and this already works correctly without splitting the classes. |
|
Friendly ping for review |
|
in case it's relevant, pinging our new GPU-target group @rustbot ping gpu-target |
|
Hi GPU experts, this issue/PR could use some guidance on how this should be cc @eddyb @fee1-dead @Firestar99 @Flakebi @FractalFir @karolzwolak @kjetilkjeka @kulst @LegNeato @RalfJung @Sa4dUs @workingjubilee @ZuseZ4 |
|
Uh, what? I am not a GPU expert and never agreed to be in this ping group.^^ I'm afraid I don't have capacity to put GPUs on my plate as well. |
| vgpr352, | ||
| vgpr384, | ||
| vgpr512, | ||
| vgpr1024, |
There was a problem hiding this comment.
Considering the sheer variety of register sizes, I think that this should be represented like this instead:
enum AmdgpuInlineAsmRegClass {
Sgpr(u16),
Vgpr(u16),
}Then you would manually implement the relevant methods instead of using the def_reg_class macro.
You will need to add validation during parse to ensure that the register size is valid: it must be a multiple of 32, with an exception for {sv}gpr16 which represents half of a normal register.
There was a problem hiding this comment.
I implemented this in the last commit, it saves 20 lines (yay).
Instead of checking for multiple of 32 I re-used the function that tries converting a class to a symbol. That should be quick to run and is more accurate.
AGPRs are critical for datacenter ML workloads, but this seems like the right call to me at least (not as familiar with amd stuff). |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| ) -> &'static [(InlineAsmType, Option<Symbol>)] { | ||
| match self { | ||
| Self::Vgpr(16) => types! { _: I16, F16; }, | ||
| Self::Sgpr(32) | Self::Vgpr(32) => types! { _: I16, I32, F16, F32, |
There was a problem hiding this comment.
| Self::Sgpr(32) | Self::Vgpr(32) => types! { _: I16, I32, F16, F32, | |
| Self::Sgpr(32) | Self::Vgpr(32) => types! { _: I32, F32, |
Add support for inline assembly for the amdgpu backend (the amdgcn-amd-amdhsa target). Add register classes for `vgpr` (vector general purpose register) and `sgpr` (scalar general purpose register). The LLVM backend supports two more classes, `reg`, which is either VGPR or SGPR, up to the compiler to decide. As instructions often rely on a register being either a VGPR or SGPR for the assembly to be valid, reg doesn’t seem that useful (I struggled to write correct tests for it), so I didn’t end up adding it. The fourth register class is AGPRs, which only exist on some hardware versions (not the consumer ones) and they have restricted ways to write and read from them, which makes it hard to write a Rust variable into them. They could be used inside assembly blocks, but I didn’t add them as Rust register class. There is one change affecting general inline assembly code, that is `InlineAsmReg::name()` now returns a `Cow` instead of a `&'static str`. Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4 VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu reg stores the register number(s) and a non-static String is generated at runtime for the register name.
Use just two variants for SGPRs and VGPRs, each with an int for the register size in bits.
Remove support for 64-bit vector types to make it simpler.
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
I kept support for 16-bit values in 32-bit registers as that is quite commonly used (proper 16-bit register support is only available on newer hardware). All new changes are in the last commit 581b55a. |
View all comments
Add support for inline assembly for the amdgpu backend (the amdgcn-amd-amdhsa target).
Add register classes for
vgpr(vector general purpose register) andsgpr(scalar general purpose register).The LLVM backend supports two more classes,
reg, which is either VGPR or SGPR, up to the compiler to decide. As instructions often rely on a register being either a VGPR or SGPR for the assembly to be valid, reg doesn’t seem that useful (I struggled to write correct tests for it), so I didn’t end up adding it.The fourth register class is AGPRs, which only exist on some hardware versions (not the consumer ones) and they have restricted ways to write and read from them, which makes it hard to write a Rust variable into them. They could be used inside assembly blocks, but I didn’t add them as Rust register class.
There is one change affecting general inline assembly code, that is
InlineAsmReg::name()now returns aCowinstead of a&'static str. Because amdgpu has many registers, 256 VGPRs plus combinations of 2 or 4 VGPRs, and I didn’t want to list hundreds of static strings, the amdgpu reg stores the register number(s) and a non-static String is generated at runtime for the register name.Tracking issue: #135024