You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: U8x64 byte-level ops for palette codec, nibble, byte scan (Pumpkin/SD)
Added to all three tiers (AVX-512 / AVX2 / scalar):
cmpeq_mask(other) → u64 — byte-wise equality, returns bitmask
shr_epi16(imm) → Self — shift right 16-bit lanes (nibble extract)
saturating_sub(other) — max(a-b, 0) per byte (delta subtraction)
unpack_lo_epi8(other) — interleave low bytes (nibble interleave)
unpack_hi_epi8(other) — interleave high bytes
These operations are used by:
palette_codec.rs — Minecraft-style variable-width bit packing
nibble.rs — 4-bit light level packing (Pumpkin)
byte_scan.rs — NBT format byte scanning
(future) stable_diffusion/ — VAE latent palette encoding via GGUF
All three are currently using raw _mm256_/_mm512_ intrinsics.
Next step: rewire them to use crate::simd::U8x64 instead.
https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
#76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.