Commit 86edac4
authored
Comm gemm fixes (#2818)
* Fix GemmRs B descriptor lld for transb=true
With a row_major (1×P) grid, all rows are on a single process row,
so the local leading dimension must be n (full row count), not
block_size(n) which is n/P.
Signed-off-by: Almog Segal <asegal@nvidia.com>
* Set GemmRs communication type to output data type
Match the UserBuffers behavior where the reduce-scatter operates
in the output precision rather than FP32.
Signed-off-by: Almog Segal <asegal@nvidia.com>
---------
Signed-off-by: Almog Segal <asegal@nvidia.com>1 parent ac96651 commit 86edac4
1 file changed
Lines changed: 8 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
189 | | - | |
190 | | - | |
191 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
192 | 192 | | |
193 | 193 | | |
194 | 194 | | |
| |||
200 | 200 | | |
201 | 201 | | |
202 | 202 | | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
203 | 208 | | |
204 | 209 | | |
205 | 210 | | |
| |||
0 commit comments