Short-circuit HNSW search for similarity-based vector queries?

### Description

Spinoff from #15836

A KNN query [short-circuits the HNSW search](https://github.com/apache/lucene/blob/83e3f9ac24ac282ae353d0e0566f64640fe919a3/lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java#L345) if the "expected" number of nodes visited is >= number of filtered nodes.

A similarity-based vector query (i.e. `[Byte|Float]VectorSimilarityQuery`) attempts to find _all_ vectors with a score above a threshold (for Euclidean similarity, this can be imagined as all vectors within a radius of the query vector).

Assuming document vectors are evenly spread out across the n-dimensional space, should vector similarity scores form a normal distribution?

If so, can we estimate the proportion of nodes visited using area under the curve (from `resultSimilarity` -> `∞`) of a normal distribution? (and apply the same short circuit logic)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Short-circuit HNSW search for similarity-based vector queries? #15869

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Short-circuit HNSW search for similarity-based vector queries? #15869

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions