Add SQ8↔FP16 x86 SIMD distance kernels [MOD-14954] by dor-forer · Pull Request #970 · RedisAI/VectorSimilarity

dor-forer · 2026-05-26T10:32:57Z

Summary

Implement AVX-512 (F+BW+VL+VNNI) / AVX2+FMA / AVX2 / SSE4 kernels for asymmetric SQ8↔FP16 distance (IP / Cosine / L2²) on Intel x86.
Wire the new kernels into the dispatcher (IP_space.cpp / L2_space.cpp) with F16C gating on the AVX2/SSE4 tiers; AVX-512 path uses _mm512_cvtph_ps which is part of AVX512F (no F16C requirement).
Append -mf16c conditionally to AVX2_FMA / AVX2 / SSE4 dispatcher source files in CMake (purely additive — no SQ8_FP32 / SQ8_SQ8 / INT8 / UINT8 codegen change since those sources contain no F16C intrinsics).
Extend SQ8_FP16_SpacesOptimizationTest to walk all four ISA tiers against the scalar baseline across dim ∈ [16, 32]; update existing GetDistFunc_*_SQ8_FP16 assertions accordingly.
Register per-ISA microbenchmarks in bm_spaces_sq8_fp16.cpp, mirroring the SQ8_FP32 layout.

Spec: docs/superpowers/specs/2026-05-26-sq8-fp16-x86-kernels-design.md
Plan: docs/superpowers/plans/2026-05-26-sq8-fp16-x86-kernels.md

Test plan

`make clean ALL=1 && make build && make unit_test` — 2271 tests pass (release).
`make asan` — 2271 tests pass under AddressSanitizer.
Benchmark binary runs on hosts that lack AVX-512 (AVX-512 tier gracefully reports "AVX512F_BW_VL_VNNI not available"; AVX2+FMA / AVX2 / SSE4 / scalar tiers all execute and report timings).

🤖 Generated with Claude Code

Note

Medium Risk
Changes hot-path vector distance math and runtime ISA selection; incorrect dispatch or SIMD numerics could affect search quality, though tier tests assert ~0.01f agreement with scalar and builds gate intrinsics by CPU flags.

Overview
Adds x86 SIMD for asymmetric SQ8 (index) ↔ FP16 (query) distances: inner product, cosine, and L2², with scalar fallback unchanged for dim < 16 or missing ISA features.

Build: F16C-dependent code moves into separate objects (AVX2_F16C, AVX2_FMA_F16C, SSE4_F16C) compiled with -mf16c (and -mavx for SSE4) only when the toolchain reports F16C+FMA+AVX; base AVX2 / AVX2_FMA / SSE4 TUs stay F16C-free. AVX-512 SQ8↔FP16 kernels live in the existing AVX512F TU (uses _mm512_cvtph_ps, no separate F16C flag).

Runtime: IP_SQ8_FP16_GetDistFunc, Cosine_SQ8_FP16_GetDistFunc, and L2_SQ8_FP16_GetDistFunc select AVX-512F → AVX2+FMA+F16C → AVX2+F16C → SSE4.1+F16C+AVX, set SQ8 alignment hints when dim is a multiple of the chunk size, and otherwise return the scalar implementations.

Validation: Parameterized tests compare each tier to the scalar baseline; benchmarks register the same ISA ladder beside the naive kernels.

^{Reviewed by Cursor Bugbot for commit 3565985. Bugbot is set up for automated code reviews on this repo. Configure here.}

Captures the architecture, file-level plan, CMake F16C gating, and risk register for adding AVX-512 / AVX2+FMA / AVX2 / SSE4 kernels for the asymmetric SQ8 (storage) ↔ FP16 (query) distance functions, wiring them into the existing dispatcher tables and SQ8_FP16 unit/benchmark scaffolding from MOD-15141. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jit-ci · 2026-05-26T10:34:21Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

Enables _mm{,256}_cvtph_ps in the AVX2+FMA, AVX2, and SSE4 dispatcher translation units so the upcoming SQ8↔FP16 kernels can widen FP16 lanes to FP32. The flag is appended only when CXX_F16C is detected; existing SQ8_FP32 / SQ8_SQ8 / INT8 / UINT8 sources contain no F16C intrinsics so emitted code for those kernels is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Parameterised gtest fixture mirroring SQ8_FP32_SpacesOptimizationTest; currently asserts only the scalar fallback path. Per-tier SIMD assertion blocks (AVX-512, AVX2+FMA, AVX2, SSE4) are added alongside the kernel implementations in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Implements asymmetric SQ8 (storage) ↔ FP16 (query) Inner Product, Cosine, and L2² kernels for the AVX-512 F+BW+VL+VNNI tier. Each chunk widens 16 SQ8 lanes via cvtepu8_epi32 + cvtepi32_ps and 16 FP16 lanes via _mm512_cvtph_ps, then fmadds into a 16-lane FP32 accumulator. SQ8 storage and FP16 query metadata reads use load_unaligned to tolerate odd dimensions. Dispatcher branches in IP_space.cpp / L2_space.cpp select the new Choose_SQ8_FP16_*_implementation_AVX512F_BW_VL_VNNI when features.avx512f && features.avx512bw && features.avx512vl && features.avx512vnni; otherwise behaviour is unchanged from MOD-15141. A parameterised gtest fixture exercises every residual class in [16, 32] against the scalar baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

8-wide AVX2+FMA kernels widen 8 SQ8 lanes via cvtepu8_epi32 + cvtepi32_ps and 8 FP16 lanes via _mm256_cvtph_ps, then fmadd into a 256-bit FP32 accumulator. Residual (< 8) lanes load the full 16-byte FP16 block, convert, then blend zero across unused lanes — mirroring the existing F16C FP16 kernel pattern. Dispatcher branch in {IP,Cosine,L2}_SQ8_FP16_GetDistFunc selects the new Choose_SQ8_FP16_*_implementation_AVX2_FMA when features.avx2 && features.fma3 && features.f16c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the AVX2+FMA kernels but uses _mm256_mul_ps + _mm256_add_ps instead of _mm256_fmadd_ps so it can run on Haswell-era AVX2 hardware without FMA support (uncommon but matches the existing SQ8_FP32 tiering). Dispatcher gate requires features.avx2 && features.f16c and runs between the AVX2+FMA and SSE4 tiers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

4-wide SSE4 kernels widen 4 SQ8 lanes via cvtepu8_epi32 + cvtepi32_ps and 4 FP16 lanes via _mm_cvtph_ps (F16C), then mul+add into a 128-bit FP32 accumulator (SSE4 has no FMA). Residual % 4 lanes are materialised via _mm_set_ps + the scalar FP16_to_FP32 helper, mirroring the existing SSE4 SQ8_FP32 residual pattern. Dispatcher gate requires features.sse4_1 && features.f16c && features.avx since F16C is VEX-encoded — matches the existing F16C/FP16 dispatcher gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The SQ8_FP16 GetDistFunc dispatcher now returns AVX-512 / AVX2+FMA / AVX2 / SSE4 SIMD kernels when the corresponding feature flags are set (only scalar previously). Updates the GetDistFunc_*_SQ8_FP16 asserts to compute the expected function for the host's highest supported tier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds AVX-512 / AVX2+FMA / AVX2 / SSE4 benchmark registrations to bm_spaces_sq8_fp16.cpp, mirroring the SQ8_FP32 layout. Gates each tier on the corresponding OPT_* defines plus the runtime feature checks that mirror the dispatcher in IP_space.cpp / L2_space.cpp. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-26T10:50:57Z

Codecov Report

❌ Patch coverage is 99.04762% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.05%. Comparing base (bbe9dfd) to head (3565985).

Files with missing lines	Patch %	Lines
src/VecSim/spaces/IP_space.cpp	95.23%	2 Missing ⚠️
src/VecSim/spaces/L2_space.cpp	95.23%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #970      +/-   ##
==========================================
+ Coverage   96.99%   97.05%   +0.05%     
==========================================
  Files         130      141      +11     
  Lines        7793     8105     +312     
==========================================
+ Hits         7559     7866     +307     
- Misses        234      239       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- CMake: gate `-mf16c` on CXX_F16C AND CXX_FMA AND CXX_AVX (matches OPT_F16C macro) and append `-mavx` to the SSE4 dispatcher when adding -mf16c, since F16C is VEX-encoded and requires AVX state. Mirrors the existing F16C.cpp recipe and prevents miscompiles on toolchains with F16C but without AVX. - IP_SSE4_SQ8_FP16.h: replace `*reinterpret_cast<const int32_t *>(pVect1)` with `load_unaligned<int32_t>(pVect1)` to remove strict-aliasing UB on the uint8_t SQ8 lane load. - IP_AVX2{,_FMA}_SQ8_FP16.h: improve the residual-mask comment to spell out the asymmetric-mask reasoning (SQ8 unmasked is safe because the FP16 query blend forces those FP32 query lanes to 0 → garbage·0=0). - IP_AVX{512,2,2_FMA,SSE4}_SQ8_FP16.h: add the `IP = min·y_sum + delta·Σ(q·y)` algebraic-identity comment header that AVX-512 already carried, plus a precondition note that callers must enforce dim >= 16 (matches the established SQ8_FP32 convention; no runtime assert because sibling SQ8_FP32 SIMD kernels also rely on the dispatcher gate). - test_spaces.cpp: route the SQ8_FP16 edge-case tests (ZeroQuery, ConstantStorage, MixedSignQuery) through {IP,Cosine,L2}_SQ8_FP16_GetDistFunc so the runtime-selected SIMD tier is actually exercised on those inputs, not just the scalar reference. - test_spaces.cpp: add SQ8_FP16_SIMD_HighDim suite with dims {64, 128, 256, 512, 1024} so multi-iteration do-while loop bugs would fire (the existing [16, 32] range covers at most two AVX-512 chunk iterations). - test_spaces.cpp: add SQ8_FP16_SIMD_TierCoverage.ReportTiersExercised — a single test that emits per-tier coverage to stderr and GTEST_SKIPs when no SIMD tier is available, so CI runners without AVX-512 do not silently report zero tier-1 coverage. - test_spaces.cpp: scalar-fallback `alignment` checks now seed the value with 0xFF and assert it remains 0xFF, verifying the dispatcher contract ("scalar leaves caller's value untouched") instead of just measuring that the variable's pre-zeroed init survived. - test_spaces.cpp: drop the stale MOD-15152/MOD-15153 wiring-TODO comment on SQ8_FP16_NoOptimizationSpacesTest now that the SIMD tiers are wired. - bm_spaces_sq8_fp16.cpp: drop the matching stale comment. Out of scope (separate ticket): two-accumulator FMA refactor (also affects SQ8_FP32) and the SSE4 residual `_mm_cvtph_ps` perf opportunity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Break the FMA / mul+add dependency chain in all four SQ8↔FP16 IP kernels by widening the inner loop to use multiple independent accumulators. L2 kernels inherit the change through their `…InnerProductImp_…` call. - IP_AVX512F_BW_VL_VNNI_SQ8_FP16.h: 1 → 4 accumulators, unroll-4 main loop (64 lanes/iter) with a 16-lane tail for the 0..3 remaining chunks. - IP_AVX2_FMA_SQ8_FP16.h, IP_AVX2_SQ8_FP16.h: 1 → 2 accumulators; the existing 2-step unrolled body now routes each step to an independent accumulator. The `residual >= 8` half-chunk feeds the second accumulator so the prologue also breaks the dependency chain. - IP_SSE4_SQ8_FP16.h: 1 → 2 accumulators; do-while unrolled 1 → 2 steps per iteration (4 → 8 lanes/iter). Residual-ladder steps alternate between sum_a and sum_b for prologue ILP. Correctness invariant: residual block consumes exactly `residual` lanes (0..15) → remaining tail is always a multiple of 16, so the unrolled loops (multiples of 8 / 16 / 64) terminate exactly. Verified by 131 SQ8_FP16 unit tests + 115 under ASan.

The SQ8↔FP16 AVX-512 kernel does not actually issue any VNNI instruction — the inner loop is FP32 FMA (`_mm512_fmadd_ps`) over lanes widened from SQ8 (`_mm512_cvtepu8_epi32` + `_mm512_cvtepi32_ps`) and FP16 (`_mm512_cvtph_ps`). Real VNNI use would require an integer-encoded query, which is a different kernel entirely. The file/function names are renamed to match what the kernel actually uses (AVX-512F). The dispatcher .cpp/.h files stay named after the runtime tier (AVX512F_BW_VL_VNNI) since the SQ8↔FP16 kernel still registers under that tier alongside the genuinely VNNI-using SQ8↔SQ8 / INT8 / UINT8 kernels — the gate is a CPU-feature gate, not an ISA claim. The same misnomer exists for SQ8↔FP32; tracked separately so the rename there can ship as its own commit. Also: fix a strict-aliasing-class UB introduced by the AVX-512 unroll-4 loop. `while (pVec1 + 64 <= pEnd1)` forms a pointer past one-past-end of the SQ8 storage object when fewer than 64 lane bytes remain, which is UB in C++ regardless of dereference. Switched to pointer subtraction (`static_cast<size_t>(pEnd1 - pVec1) >= 64`). Renames: - IP_AVX512F_BW_VL_VNNI_SQ8_FP16.h -> IP_AVX512F_SQ8_FP16.h - L2_AVX512F_BW_VL_VNNI_SQ8_FP16.h -> L2_AVX512F_SQ8_FP16.h - SQ8_FP16_{InnerProduct,Cosine,L2Sqr}SIMD16_AVX512F_BW_VL_VNNI -> _AVX512F - Choose_SQ8_FP16_{IP,Cosine,L2}_implementation_AVX512F_BW_VL_VNNI -> _AVX512F Verified: 131 SQ8_FP16 unit tests + 115 under ASan.

Design doc was added in ad941b8 for planning; not appropriate as a long-lived in-repo artifact. Keep externally (Confluence / scratch) rather than ship with the kernel commit.

Two trims, both restoring pre-existing patterns elsewhere in the file: 1. `GetDistFuncSQ8FP16Asymmetric` had grown a runtime SIMD-tier walk that duplicated coverage already provided by `SQ8_FP16_SpacesOptimizationTest`. Reduced to the bare dispatcher-equality check used by the FP32 / SQ8↔SQ8 sister tests at lines 540-548 and 551-559. 2. The `SQ8_FP16_EdgeCases` tests (`ZeroQueryTest`, `ConstantStorageTest`, `MixedSignQueryTest`) were routed through `{IP,Cosine,L2}_SQ8_FP16_GetDistFunc(dim, nullptr)` to force runtime SIMD dispatch on adversarial inputs. Reverted to direct scalar calls (`SQ8_FP16_InnerProduct`, etc.) — the original pre-fdc5c1cd shape. Coverage rationale: the SIMD kernels are branchless on input values (verified by grep — no value-dependent `if` in any tier). Every code path is therefore exercised by `SQ8_FP16_SpacesOptimizationTest`'s random inputs at multiple dims. The edge-case tests verify the *algebraic identity* (IP of zero query = 1.0, constant storage matches dequant baseline, mixed-sign handling) — scalar correctness on these inputs is what was actually being checked, and the SIMD path matches scalar via the SpacesOptimizationTest tier walk. Net: 77 lines removed from the test file, matches sister conventions, no coverage gap.

The SQ8↔FP16 kernels in the SSE4, AVX2, and AVX2+FMA tiers depend on F16C (`_mm_cvtph_ps` / `_mm256_cvtph_ps`), while every other kernel in those dispatcher TUs is F16C-clean. The previous arrangement mixed both under `#ifdef OPT_F16C` blocks inside the base dispatcher .cpp/.h files. Split each tier's F16C-dependent kernels off into a sibling TU: functions/SSE4.cpp → SSE4 + SQ8_FP32 (no F16C) functions/SSE4_F16C.cpp → SQ8_FP16 only (requires -mavx -mf16c) functions/AVX2.cpp → AVX2 + BF16 + SQ8_FP32 (no F16C) functions/AVX2_F16C.cpp → SQ8_FP16 only (requires -mf16c) functions/AVX2_FMA.cpp → SQ8_FP32 (no F16C) functions/AVX2_FMA_F16C.cpp → SQ8_FP16 only (requires -mf16c) The AVX-512 tier is unaffected — its SQ8_FP16 kernel uses `_mm512_cvtph_ps`, which is part of AVX-512F and not F16C. CMake now compiles each sibling TU conditionally on `_has_full_f16c` and applies the F16C flags only there. Base TUs no longer carry `-mf16c`, since they no longer reference F16C intrinsics. Result: - No `#ifdef OPT_F16C` directives in `functions/*.cpp` or `functions/*.h`. - Compile-time isolation: an F16C intrinsic accidentally added outside a `_F16C` sibling will fail to build, not silently miscompile. - Caller sites (`IP_space.cpp`, `L2_space.cpp`, `test_spaces.cpp`, `bm_spaces.h`) still gate the *calls* with `#ifdef OPT_F16C`; the new sibling .h includes are unconditional, since declarations alone don't link-error and the calls remain guarded. Verified: 131 SQ8_FP16 unit tests + 115 ASan + 1166 full test_spaces suite (covers other ISA tiers SQ8_FP32 / BF16 / INT8 / UINT8 to confirm no regression from the dispatcher restructure).

…[MOD-14954] Two related cleanups in the SQ8↔FP16 dispatch path: 1. The AVX-512 SQ8↔FP16 kernel only uses AVX-512F instructions (`_mm512_cvtph_ps`, `_mm512_fmadd_ps`, etc.) but was registered under the VNNI tier (`OPT_AVX512_F_BW_VL_VNNI` + check of avx512f/bw/vl/vnni). That meant CPUs with AVX-512F but no VNNI (Skylake-X, some Cascade Lake variants, etc.) would fall through to AVX2_FMA even though they can run the AVX-512 kernel. Moved the `Choose_SQ8_FP16_{IP,Cosine,L2}_implementation_AVX512F` definitions from `functions/AVX512F_BW_VL_VNNI.cpp` to `functions/AVX512F.cpp`, with matching header reshuffle. Dispatch sites now gate on `OPT_AVX512F` + `features.avx512f`. 2. F16C is a transversal requirement across the non-AVX-512 SQ8↔FP16 tiers (SSE4, AVX2, AVX2+FMA) — every one of them widens FP16 query lanes via `vcvtph2ps`. Per-tier nested `#ifdef OPT_F16C` was hoisted into a single outer block around the three ISA branches in `IP_SQ8_FP16_GetDistFunc`, `Cosine_SQ8_FP16_GetDistFunc`, and `L2_SQ8_FP16_GetDistFunc`. Verified: 131 SQ8_FP16 release + 115 ASan + 1166 full test_spaces suite.

Remove extraneous blank lines in SSE4 and AVX2_FMA source files, fix indentation in AVX512F SQ8_FP16 function signatures, and reformat benchmark macro invocation to fit line length conventions.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 839fe3c. Configure here.}

cursor · 2026-05-28T11:53:06Z

+        any_simd = true;
+    } else {
+        std::cerr << "[SQ8_FP16] AVX-512 F+BW+VL+VNNI tier NOT exercised on this host\n";
+    }


Tier coverage test checks wrong AVX-512 preprocessor guard

Low Severity

The ReportTiersExercised test gates the AVX-512 check on OPT_AVX512_F_BW_VL_VNNI, but the SQ8_FP16 kernels are dispatched under OPT_AVX512F (as seen in IP_space.cpp and L2_space.cpp and in the SQ8_FP16_SpacesOptimizationTest at line 3097). On CPUs with AVX-512F but without BW+VL+VNNI, the SIMD optimization tests correctly exercise the AVX-512F tier, but this coverage reporter incorrectly omits it — potentially causing any_simd to remain false and the test to spuriously skip.

^{Reviewed by Cursor Bugbot for commit 839fe3c. Configure here.}

The comments referencing SQ8-to-FP16 dispatch location are no longer accurate after the recent refactoring that moved the dispatch logic. Clean up these stale comments from the AVX512F_BW_VL_VNNI files.

dor-forer added the benchmarks-all label May 26, 2026

dor-forer and others added 8 commits May 26, 2026 13:39

dor-forer force-pushed the dor-forer-sq8-fp16-x86-kernels-mod-14954 branch from a43eac4 to e21cb3b Compare May 26, 2026 10:42

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/VecSim/spaces/CMakeLists.txt Outdated

dor-forer and others added 9 commits May 26, 2026 14:07

Reformat SQ8↔FP16 SIMD kernels for consistent line breaks

4c8828e

Remove SQ8↔FP16 design doc from PR [MOD-14954]

fe69f85

Design doc was added in ad941b8 for planning; not appropriate as a long-lived in-repo artifact. Keep externally (Confluence / scratch) rather than ship with the kernel commit.

Clean up whitespace and formatting inconsistencies

839fe3c

Remove extraneous blank lines in SSE4 and AVX2_FMA source files, fix indentation in AVX512F SQ8_FP16 function signatures, and reformat benchmark macro invocation to fit line length conventions.

cursor Bot reviewed May 28, 2026

View reviewed changes

Remove obsolete SQ8-to-FP16 dispatch comments

3565985

The comments referencing SQ8-to-FP16 dispatch location are no longer accurate after the recent refactoring that moved the dispatch logic. Clean up these stale comments from the AVX512F_BW_VL_VNNI files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SQ8↔FP16 x86 SIMD distance kernels [MOD-14954]#970

Add SQ8↔FP16 x86 SIMD distance kernels [MOD-14954]#970
dor-forer wants to merge 19 commits into
mainfrom
dor-forer-sq8-fp16-x86-kernels-mod-14954

dor-forer commented May 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

jit-ci Bot commented May 26, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dor-forer commented May 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

jit-ci Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ Jit Security Scan Results

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 28, 2026

Choose a reason for hiding this comment

Tier coverage test checks wrong AVX-512 preprocessor guard

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dor-forer commented May 26, 2026 •

edited by cursor Bot

Loading

jit-ci Bot commented May 26, 2026 •

edited

Loading

codecov Bot commented May 26, 2026 •

edited

Loading