Triangular Inverse Kernel (continuation of Zouzias' impl)#830
Triangular Inverse Kernel (continuation of Zouzias' impl)#830MirkoDeVita98 wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new triangular matrix inverse example, including a recursive unrolled AICore kernel, orchestration logic, and a Python test suite. It also updates the benchmark_bgemm test configuration and adds debug logging to the scene test framework. Review feedback suggests removing leftover debug print statements, correcting documentation comments about the configuration tensor layout, and simplifying the kernel dispatch function by removing redundant template parameters.
| print("actual: ", actual[:10]) | ||
| print("diff: ", (actual - expected).abs().mean().item()) |
| num_matrices = params["num_matrices"] | ||
| M = args.M.reshape(num_matrices, 1, n, n).to(torch.float16) | ||
| M_inv = linalg_inv(M) | ||
| print("M_inv (golden):", M_inv.flatten()[:10]) |
There was a problem hiding this comment.
Let drop the print in L123.
| * args[0] = M (INPUT) fp16 triangular matrices [num_matrices, N, N] | ||
| * args[1] = I_neg (INPUT) fp16 negative identity [N, N] | ||
| * args[2] = M_inv (OUTPUT) fp16 result [num_matrices, N, N] | ||
| * args[3] = config (INPUT) int64[3]: [matrix_size, num_matrices, is_lower] |
There was a problem hiding this comment.
The comment incorrectly states that the config tensor has 3 elements. The implementation (lines 832-835) and the test case (line 109) use 4 elements: [matrix_size, num_matrices, is_lower, block_dim]. Please update the comment to reflect the actual layout.
* args[3] = config (INPUT) int64[4]: [matrix_size, num_matrices, is_lower, block_dim]References
- Ensure that comments accurately reflect the code's behavior and design.
| * tensor(0) = M (INPUT) fp16 triangular matrices [num_matrices * N * N] | ||
| * tensor(1) = I_neg (INPUT) fp16 negative identity [N * N] | ||
| * tensor(2) = M_inv (OUTPUT) fp16 result [num_matrices * N * N] | ||
| * tensor(3) = config (INPUT) int64[3]: [matrix_size, num_matrices, is_lower, block_dim] |
There was a problem hiding this comment.
The comment describes the config tensor as int64[3], but then lists 4 parameters. It should be updated to int64[4] to match the implementation and the Python test.
| * tensor(3) = config (INPUT) int64[3]: [matrix_size, num_matrices, is_lower, block_dim] | |
| * tensor(3) = config (INPUT) int64[4]: [matrix_size, num_matrices, is_lower, block_dim] |
References
- Ensure that comments accurately reflect the code's behavior and design.
| template <typename InputT, typename OutputT, uint32_t NumTilesPerCubeIter, bool IsBSND> | ||
| AICORE void run_tri_inv_rec_unroll_per_num_matrices( |
There was a problem hiding this comment.
The template parameters NumTilesPerCubeIter and IsBSND in run_tri_inv_rec_unroll_per_num_matrices are redundant because the function body ignores them and performs its own dispatch using literal values (1, 2, 4 for NumTilesPerCubeIter and true/false for IsBSND). These parameters should be removed from the template signature to avoid confusion and improve maintainability.
References
- Remove unused parameters from functions or APIs to improve code clarity and avoid misleading users.
| @@ -0,0 +1,8 @@ | |||
|
|
|||
There was a problem hiding this comment.
Let's remove this Makefile. I used it for testing purposes (avoids excessive typing :-) )
| for name in output_names: | ||
| actual = getattr(test_args, name) | ||
| expected = getattr(golden_args, name) | ||
| print("actual: ", actual[:10]) |
There was a problem hiding this comment.
Let's remove this diff (it was added for debugging)
| num_matrices = params["num_matrices"] | ||
| M = args.M.reshape(num_matrices, 1, n, n).to(torch.float16) | ||
| M_inv = linalg_inv(M) | ||
| print("M_inv (golden):", M_inv.flatten()[:10]) |
There was a problem hiding this comment.
Let drop the print in L123.
| class TestBenchmarkBgemm(SceneTestCase): | ||
| RTOL = 1e-3 | ||
| ATOL = 1e-3 | ||
| RTOL = 1e-5 |
There was a problem hiding this comment.
Let revert the changes in this file to keep the MR only about the triangular inverse.
To reproduce:
pip install --no-build-isolation -e '.[test]' python examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/test_triangular_inverse.py -p a2a3sim