Skip to content

Triangular Inverse Kernel (continuation of Zouzias' impl)#830

Open
MirkoDeVita98 wants to merge 9 commits into
hw-native-sys:mainfrom
MirkoDeVita98:port/zouzias-pr-1
Open

Triangular Inverse Kernel (continuation of Zouzias' impl)#830
MirkoDeVita98 wants to merge 9 commits into
hw-native-sys:mainfrom
MirkoDeVita98:port/zouzias-pr-1

Conversation

@MirkoDeVita98
Copy link
Copy Markdown

@MirkoDeVita98 MirkoDeVita98 commented May 20, 2026

To reproduce:

pip install --no-build-isolation -e '.[test]'
python examples/a2a3/tensormap_and_ringbuffer/triangular_inverse_example/test_triangular_inverse.py -p a2a3sim

@MirkoDeVita98 MirkoDeVita98 changed the title Triangular Inverse Kernel (continuation of Zozuias's impl) Triangular Inverse Kernel (continuation of Zouzias's impl) May 20, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new triangular matrix inverse example, including a recursive unrolled AICore kernel, orchestration logic, and a Python test suite. It also updates the benchmark_bgemm test configuration and adds debug logging to the scene test framework. Review feedback suggests removing leftover debug print statements, correcting documentation comments about the configuration tensor layout, and simplifying the kernel dispatch function by removing redundant template parameters.

Comment on lines +696 to +697
print("actual: ", actual[:10])
print("diff: ", (actual - expected).abs().mean().item())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These print statements appear to be leftovers from debugging and should be removed to maintain clean test output.

num_matrices = params["num_matrices"]
M = args.M.reshape(num_matrices, 1, n, n).to(torch.float16)
M_inv = linalg_inv(M)
print("M_inv (golden):", M_inv.flatten()[:10])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This print statement seems to be for debugging and should be removed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let drop the print in L123.

* args[0] = M (INPUT) fp16 triangular matrices [num_matrices, N, N]
* args[1] = I_neg (INPUT) fp16 negative identity [N, N]
* args[2] = M_inv (OUTPUT) fp16 result [num_matrices, N, N]
* args[3] = config (INPUT) int64[3]: [matrix_size, num_matrices, is_lower]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment incorrectly states that the config tensor has 3 elements. The implementation (lines 832-835) and the test case (line 109) use 4 elements: [matrix_size, num_matrices, is_lower, block_dim]. Please update the comment to reflect the actual layout.

 *   args[3] = config (INPUT)  int64[4]: [matrix_size, num_matrices, is_lower, block_dim]
References
  1. Ensure that comments accurately reflect the code's behavior and design.

* tensor(0) = M (INPUT) fp16 triangular matrices [num_matrices * N * N]
* tensor(1) = I_neg (INPUT) fp16 negative identity [N * N]
* tensor(2) = M_inv (OUTPUT) fp16 result [num_matrices * N * N]
* tensor(3) = config (INPUT) int64[3]: [matrix_size, num_matrices, is_lower, block_dim]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment describes the config tensor as int64[3], but then lists 4 parameters. It should be updated to int64[4] to match the implementation and the Python test.

Suggested change
* tensor(3) = config (INPUT) int64[3]: [matrix_size, num_matrices, is_lower, block_dim]
* tensor(3) = config (INPUT) int64[4]: [matrix_size, num_matrices, is_lower, block_dim]
References
  1. Ensure that comments accurately reflect the code's behavior and design.

Comment on lines +778 to +779
template <typename InputT, typename OutputT, uint32_t NumTilesPerCubeIter, bool IsBSND>
AICORE void run_tri_inv_rec_unroll_per_num_matrices(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The template parameters NumTilesPerCubeIter and IsBSND in run_tri_inv_rec_unroll_per_num_matrices are redundant because the function body ignores them and performs its own dispatch using literal values (1, 2, 4 for NumTilesPerCubeIter and true/false for IsBSND). These parameters should be removed from the template signature to avoid confusion and improve maintainability.

References
  1. Remove unused parameters from functions or APIs to improve code clarity and avoid misleading users.

@MirkoDeVita98 MirkoDeVita98 changed the title Triangular Inverse Kernel (continuation of Zouzias's impl) Triangular Inverse Kernel (continuation of Zouzias' impl) May 20, 2026
Comment thread Makefile
@@ -0,0 +1,8 @@

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this Makefile. I used it for testing purposes (avoids excessive typing :-) )

for name in output_names:
actual = getattr(test_args, name)
expected = getattr(golden_args, name)
print("actual: ", actual[:10])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this diff (it was added for debugging)

num_matrices = params["num_matrices"]
M = args.M.reshape(num_matrices, 1, n, n).to(torch.float16)
M_inv = linalg_inv(M)
print("M_inv (golden):", M_inv.flatten()[:10])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let drop the print in L123.

class TestBenchmarkBgemm(SceneTestCase):
RTOL = 1e-3
ATOL = 1e-3
RTOL = 1e-5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let revert the changes in this file to keep the MR only about the triangular inverse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants