Skip to content
View BBuf's full-sized avatar

Block or report BBuf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BBuf/README.md
terminal typing banner
root@bbuf-gpu-node:~# ./profile --scan
[ok] identity     : Xiaoyu Zhang / BBuf
[ok] role         : Core Developer @ SGLang | SkyworkAI
[ok] focus        : LLM inference, CUDA kernels, AI infrastructure
[ok] preferred io : clean kernels, fast serving, measurable wins
[ok] status       : optimizing the path from model weights to tokens

./current_work

$ bbufctl top --sort impact

PID   SYSTEM                         MODE          SIGNAL
001   SGLang                         serving       high-throughput LLM and VLM inference
002   CUDA / CUTLASS / Triton        kernels       memory bandwidth, latency, occupancy
003   Kernel Pilot                   agents        AI-assisted kernel engineering loops
004   AI Infra Skills                workflows     reproducible infra/debug/profiling playbooks

./selected_systems

SGLang how-to-optim-algorithm-in-cuda

kernel-pilot AI-Infra-Auto-Driven-SKILLS

./toolchain

./runtime_stats

GitHub stats Top languages activity graph

./contact

root@bbuf-gpu-node:~# cat /etc/contact
GitHub : https://github.com/BBuf
Blog   : https://www.giantpandacv.com
Work   : SGLang / SkyworkAI

Pinned Loading

  1. tvm_mlir_learn tvm_mlir_learn Public

    compiler learning resources collect.

    Python 2.7k 370

  2. how-to-optim-algorithm-in-cuda how-to-optim-algorithm-in-cuda Public

    how to optimize some algorithm in cuda.

    Cuda 3k 276

  3. sgl-project/sglang sgl-project/sglang Public

    SGLang is a high-performance serving framework for large language models and multimodal models.

    Python 28.1k 6k

  4. vipshop/cache-dit vipshop/cache-dit Public

    A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.

    Python 1.2k 70