HabanaAI / vllm-hpu-extension Public

Notifications You must be signed in to change notification settings
Fork 48
Star 15

Code
Issues 1
Pull requests 30
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security and quality
Insights

Pull requests: HabanaAI/vllm-hpu-extension

Labels 10 Milestones 0

New pull request New

30 Open 377 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[aice/v1.22.0][WIP] add static moe swiglustep for bf16

#411 opened Mar 11, 2026 by ranzhejiang Contributor

Loading…

Block matmul and kv_cache in dynamic quantization

#395 opened Dec 3, 2025 by HolyFalafel

Loading…

Add dynamic_quant_for_gaudi2.py script to convert model

#387 opened Oct 29, 2025 by wenbinc-Bin

Loading…

[SW-238300] Disabling dynamic quantization in mlp module

#383 opened Oct 26, 2025 by HolyFalafel

Loading…

[WA] bypass the GLM OOM issue

#380 opened Oct 15, 2025 by czhu15

Loading…

pass chunk_size and global_num_experts to the MoE kernel

#369 opened Sep 19, 2025 by yangulei Contributor

Loading…

Llama-4 FP8 quantization with expert parallel support

#364 opened Sep 16, 2025 by nazneenn • Draft

Enable chunked prefill

#362 opened Sep 14, 2025 by jzhoulon

Loading…

[HS-6944] Fix for deepseek distill models

#359 opened Sep 10, 2025 by nazneenn

Loading…

[aice/v.1.22] refactor chunk size code

#354 opened Sep 1, 2025 by ranzhejiang Contributor

Loading…

Fix for Llama4 models (targets main)

#341 opened Aug 19, 2025 by vidyasiv

Loading…

Add support for block_softmax_const_max

#327 opened Aug 7, 2025 by mswiniarsk Contributor • Draft

Add flag pin_memory to call from hpu.py in vllm

#325 opened Aug 5, 2025 by xuechendi Contributor

Loading…

Add Calibration Script for SGLang FP8

#318 opened Jul 29, 2025 by SKRohit

Loading…

Fix the fusedsdpa with sliding window alignment issue

#298 opened Jul 17, 2025 by libinta Contributor

Loading…

Draft: Proper chunked prefill bucketing

#295 opened Jul 16, 2025 by kzawora-intel Collaborator • Draft

Add block_softmax_adjustment and block_softmax kernels

#289 opened Jul 16, 2025 by czhu15

Loading…

Introduce block_softmax_adjustment kernel (#163)

#263 opened Jul 8, 2025 by kdamaszk Contributor • Draft

Enable block_softmax_adjustment on Gaudi2

#254 opened Jul 2, 2025 by kdamaszk Contributor • Draft

Add pre-commit static checks

#247 opened Jun 30, 2025 by kzawora-intel Collaborator

Loading…

Allow usage of fused_block_softmax_adjustment for Qwen with Lazy

#246 opened Jun 27, 2025 by mswiniarsk Contributor • Draft

Exponential bucketing tweaks

#224 opened Jun 13, 2025 by madamczyk-intel Contributor

Loading…

Add useful internal vllm test

#200 opened May 27, 2025 by nirda7 Contributor • Draft

[SW-225565] Enable triangular softmax with merged prefill

#197 opened May 26, 2025 by kamil-kaczor Contributor • Draft

Optimized MoE on Gaudi

#159 opened Apr 18, 2025 by gyou2021 • Draft

Previous 1 2 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!